Hi all!
During the last weeks I have been very busy working on my GSoC project which
is about reducing the RTT of preemptively built circuits.
There is now a single script called "rttprober"[0] that depends on a
patched[1] Tor client running a certain configuration[2]. The goal is to
measure RTTs of Tor circuits. It takes a few parameters as input: an
authenticated Stem Tor controller for communication with the Tor client, the
number of circuits to probe, the number of probes to be taken for each circuit
and the number of circuits that should be probed concurrently. It outputs a
tar file containing lzo-compressed serialized data with detailed node
information, all circuit- and stream-events involved and the circuit build
time for further analysis.
Since the RTT-measurements are run in parallel with very short locks it is
important not to overload Tor nodes. Therefore a single node is not probed
more than once at a time.
A first analysis of some measurements taken supports the original assumption
that a Frechét distribution fits both the circuit build times[3] and round trip
times[4].
I will continue gathering and analyzing measurement data and will hopefully be
able to draw some conclusions from that.
Best,
Robert
[0] https://bitbucket.org/ra_/tor-
rtt/src/1127f6936086664981fc55b4dbc82b1570714140/rttprober.py?at=master
[1] https://bitbucket.org/ra_/tor-
rtt/src/1127f6936086664981fc55b4dbc82b1570714140/patches?at=master
[2] https://bitbucket.org/ra_/tor-
rtt/src/1127f6936086664981fc55b4dbc82b1570714140/torrc?at=master
[3] http://postimg.org/image/je8k5yydt/
[4] http://postimg.org/image/ktk90vxm7/
I'd like to improve my Haskell skills. Are there any opportunities?
I've been told there is at least one project that uses Haskell, which is
not maintained. (For example, this page [1] mentions TorDNSEL, which
was replaced by TorBEL.)
[1] https://www.torproject.org/getinvolved/volunteer.html.en
Hi Kevin,
I tried the bundles in https://kpdyer.com/fte/ .
For some reason, when I fire up 'start-tor-browser' I don't get
'fte_relay' listener to bind on '127.0.0.1:8079' (like the torrc expects
it to). Hence Tor fails to bootstrap and simply says:
"The connection to the SOCKS5 proxy server at 127.0.0.1:8079 just failed.
Make sure that the proxy server is up and running."
When I run 'start-tor-browser' I'm getting the following message in stdout:
"""
./bin/fte_relay: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.14' not
found (required by /tmp/_MEI8IBYIj/libz.so.1)
"""
could this be why fte_relay never sets up a listener?
Also, if I try to manually invoke fte_relay by doing:
./bin/fte_relay --mode client --server_ip 128.105.214.241 --server_port 8080
I get the same error.
Furthermore, I can't get ./bin/fte_relay to give me some kind of usage
information.
Any ideas?
Thanks!
Hi Karsten,
(not sure whom to CC and whom not to, I have a couple of fairly specific
technical questions / comments (which (again) I should have delved into
earlier), but then again, maybe the scope of the tor-dev mailing list
includes such cases..)
@tor-dev: This is in regard to the Searchable metrics archive project,
intersected with Onionoo stuff.
I originally wanted to ask two questions, but by the time I reached the
middle of the email, I wasn't anymore sure if they were questions, so I
think these are simply my comments / observations about a couple of
specific points, and I just want to make sure I'm making general sense. :)
..So it turns out that it is very much possible to avoid Postgres
sequential scans - to construct such queries/cases which are best executed
using indexes (the query planner thinks so, and it makes sense as far as I
can tell) and whatever efficient (hm, < O(n)?) search algorithms are deemed
best - for the metrics search backend/database; the two things potentially
problematic to me seem to be:
- when doing ORDER BY, making sure that the resulting SELECT covers /
would potentially return a relatively small amount of rows (from what I've
been reading / trying out, whenever a SELECT happens which may cover >
~10-15% of all the table's rows, sequential scan is preferred, as using
indexes would result in even more disk i/o, whereas the seq.scan can read
in massive(r) chunks of data from disk because it's, well, sequential).
This means that doing "ORDER BY nickname" (or fingerprint, or any other
non-unique but covering-a-relatively-small-part-of-all-the-table column in
the network status table) is much faster than doing e.g. "ORDER BY
validafter" (where validafter refers to the consensus document's "valid
after" field) - which makes sense, of course, when you think about it
(ordering a massive amount of rows, even with proper LIMIT etc. is insane.)
We construct separate indexes (e.g. even if 'fingerprint' is part of a
composite (validafter + fingerprint) primary key), and all seems to be well.
I've been looking into the Onionoo implementation, particularly into
ResourceServlet.java [1], to see how ordering etc. is done there. I'm new
to that code, but am I right in saying that as of now, the results (at
least for /summary and /details) are generally fairly unsorted (except for
the possibility of "order by consensus_weight"), with "valid_after" fields
appearing in an unordered manner? This of course makes sense in this case,
as Onionoo is able to return *all* the results for given search criteria
(or, if none given, all available results) at once.
The obvious problem with the archival metrics search project (argh, we are
in need of a decent name I daresay :) maybe I'll think of something.. no
matter) is then, of course, the fact that we can't return all results at
once. I've been so far assuming that it would therefore make sense to
return them, whenever possible (and if not requested otherwise, if "order"
parameter is later implemented), in "valid_after" descending order. I
suppose this makes sense? This would be ideal, methinks. So far, it seems
that we can do that, as long as we have a WHERE clause that is
restricting-enough.
select * from (select distinct on (fingerprint) fingerprint, validafter,
> nickname from statusentry where nickname like 'moria1' order by
> fingerprint, validafter desc) as subq order by validafter desc limit 100;
, for example, works out nicely, in terms of efficiency / query plan,
postgres tells me. (The double ORDER BY is needed, as postgres' DISTINCT
needs an ORDER BY, and that ORDER BY's leftmost criterion has to match
DISTINCT ON (x))
Now, you mentioned / we talked about what is the (large, archival metrics)
backend to do about limiting/cutoff? Especially if there are no search
criteria specified, for example. Ideally, it may return a top-most list of
statuses (which in /details include info from server descriptors), sorted
by "last seen" / "valid after"? Thing is, querying a database for 100 (or
1000) of items with no ORDER BY Is really cheap; introducing ORDER BYs
which would still produce tons of results is considerably less so. I'm now
looking into this (and you did tell me this, i.e. that, I now think, a
large part of DB/backend robustness gets tested at these particular points;
this should have been more obvious to me).
But in any case, I have the question: what *should* the backend return
(when no search parameters/filters specified, or very loose ones (nickname
LIKE "a") are?
What would be cheap / doable: ORDER BY fingerprint (or digest (== hashed
descriptor), etc.) It *might* (well, should) work to order by fingerprint,
limit results, and *then* reorder by validafter - with no guarantee that
the topmost results would be with highest absolute validafters. I mean,
Onionoo is doing this kind of reordering / limiting itself, but it makes
sense as it can handle all the data at once (or I'm missing something, i.e.
the file I linked to already interacts with a subset of data; granted, I
haven't thoroughly read through.) In our/this case, it makes sense to try
and outsource all relational logic to ORM (but of course if it turns out we
can cheaply get 100 arbitrary results and more easily juggle with them
ourselves / on the (direct) python side, then sure.)
But would such arbitrary returned results make sense? It would look just
like Onionoo results, but - a (small) subset of them.
Ramble #2: Onionoo is doing more or less direct (well, compiled, so
efficient) regexps on fingerprint, nickname, etc. By default, "LIKE
%given_nickname_part%" is again (sequentially) expensive; Postgres does
offer full text search extensions (would need to build additional tables,
etc.), and I think this makes sense to be done; it would cover all our "can
supply a subset of :param" bases. I'll see into this. (It's also possible
to construct functional(istic) indexes, e.g. with regexp, but need to know
the template - I wonder if using Onionoo's regexpressions would work / make
sense - will see.) For now, LIKE/= exact_string is very nice, but of course
the whole idea is that it'd be possible to supply substrings. Of note is
the fact that in this case, LIKE %substring% is O(n) in the sense that
query time correlates with row count, afaict. As of now, full text search
extensions would solve the problem I think, even if they may look a bit
like an overkill at first.
End of an email without a proper direction/point. :)
[1]:
https://gitweb.torproject.org/onionoo.git/blob/HEAD:/src/org/torproject/oni…
Dear tor-devs,
is anyone here up for a coding task that could help us research
performance improvements of the N23 design more?
The situation is that we already have a branch (n23-5 in arma's public
repository), but it's based on 0.2.4.3-alpha-dev and needs to be rebased
to current master.
In theory, it's as simple as the following steps:
$ git clone https://git.torproject.org/tor.git
$ cd tor/
$ git remote add arma https://git.torproject.org/arma/tor.git
$ git fetch arma
$ git checkout -b n23-5 arma/n23-5
$ git fetch origin
$ git rebase origin/master
(clean up the mess)
$ git add
$ git commit
$ git rebase --continue
(back to clean-up-the-mess step until git is happy)
$ git push public n23-5
Bonus points if you make sure the branch compiles with gcc warnings
enabled, appeases make check-spaces, and runs peacefully in a private
Chutney network.
Unfortunately, the n23-5 branch touches a few places in the tor code
that have been refactored in current master, including Andrea's
connection/channel rewrite. It might be necessary to dive into the
channel thing in order to get this rebase right.
Once we have a refactored n23-5 branch, I'll try to simulate it in Shadow.
For a tiny bit of context, this is for our sponsor F item 13:
https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorF/Year3
I'm asking here, because the usual suspects are already overloaded with
other stuff. As usual, I guess.
Thanks,
Karsten
Hi all,
To test the Tor program, I though an independent implementation might
help. I started writing TorPylle with that in mind.
The purpose is NOT to implement a secure or robust implementation that
could be an alternative to Tor.
It relies on Scapy (http://www.secdev.org/projects/scapy/) and is
supposed to be used more or less the same way.
The code is here : https://github.com/cea-sec/TorPylle and includes an
example file.
This is an early development stage. Comments, fixes and questions welcome !
Pierre
Hello,
I have been talking about this in #tor-dev for a while (and pestering
people with questions regarding some of the more nuanced aspects of
writing a pluggable transport, thanks to nickm, mikeperry and asn for
their help), and finally have what I would consider a pre-alpha for the
PT implementation.
obfsproxyssh is a pluggable transport that uses the ssh wire protocol to
hide tor traffic. It uses libssh2 and interacts with a real sshd
located on the bridge side. Behaviorally it is identical to a user
sshing to a host, authenticating with a RSA public/private key pair and
opening a direct-tcp channel to the ORPort of the bridge.
It is more aimed at non-technical users (because anyone with an account
on a bridge can create a tunnel of their own using existing ssh
clients), and thus can be configured entirely through the torrc.
It still needs a bit of work before it is ready for deployment but the
code is at the point where I can use it for casual web browsing, so if
people are interested, I have put a snapshot online at
http://www.schwanenlied.me/yawning/obfsproxyssh-20130627.tar.gz
Note that it is still at the "I got it working today" state, so some
parts may be a bit rough around the edges, and more than likely a few
really dumb bugs lurk unseen.
Things that still need to be done (in rough order of priority):
* It needs to scrub IP addresses in logs.
* I need to compare libssh2's KEX phase with popular ssh clients (For
the initial "production" release more than likely Putty). It currently
also sends a rather distinctive banner since I'm not sure which
client(s) it will end up mimicking.
* I need to come up with a solution for server side sshd logs. What I
will probably end up doing is writing a patch for OpenSSH to disable
logging for select users.
* In a peculiar oversight, OpenSSH doesn't have a way to disable
reverse ssh tunnels (Eg: "PermitOpen 127.0.0.1:6969" allows clients to
listen on that port). Not a big deal if Tor starts up before any
clients connect, but I'll probably end up writing another patch for this.
* Someone needs to test it on Windows. All of my external dependencies
are known to work, so the porting effort should be minimal (Famous last
words).
* The code needs to scrub the private key as soon as a connection
succeeds in authenticating instead of holding onto it. Probably not a
big deal since anyone that can look at the PT's heap can also look at
the bridge line in the torrc.
Nice to haves:
* Write real Makefiles instead of using CMake (I was lazy).
src/CMakeLists.txt currently needs to be edited for anyone compiling it.
* It currently uses unencrypted RSA keys. libssh2 supports ssh-agent
(on all of the relevant platforms) so key management can be handled that
way. I do not think there is currently a mechanism for Tor to query the
user for a passphrase and pass it to the PT, but if one gets added,
support it would also be easy from my end.
* The code does not handle IPv6 since it uses SOCKS 4 instead of 5.
When Tor gets a way to pass arguments to PTs that are > 510 bytes, I
will change this.
* libssh2 needs a few improvements going forward (In particular it does
not support ECDSA at all).
* Code for the bridge side that makes the tunnel speak the manged PT
server transport protocol would be nice (For rate limiting).
* libssh2_helpers.c should go away one day. Not sure why the libssh2
maintainers haven't merged the patch that I based the code in there on.
Things that need to be done on the Tor side of things:
* 0.2.4-14-alpha does not have the PT argument parsing code, so this
requires a build out of git to function.
* The code currently in Git fails to parse bridge lines with arguments
that can't be passed via SOCKS 5 (size restriction). The PT tarball has
a crude patch that removes the check, but the config file parser needs
to be changed.
* The Tor code currently in Git likes to forget PT arguments. asn was
kind enough to provide me with a patch that appears to fix this (though
the PT has a workaround for when it encounters this situation), but
moving forward a formal fix would be ideal.
* All the PT related cleverness in the world won't do much vs active
probing if there is an ORport exposed on the bridge. Tor should be able
to handle "ORPort 127.0.0.1:6969" (It may currently work, I'm not sure.
There should be a way to disable the reachability check if only to
reduce log spam).
Open questions:
* Is this useful?
* Is it worth biting the bullet and rewriting this to use Twisted Conch
instead of being a C app?
* Would it be simpler to write a wrapper around existing ssh clients?
(Probably not.)
* How should people handle distributing bridge information?
* How to handle the private key to a given bridge getting compromised?
(Correctly configured, all that someone who obtains the key would be
able to do is talk to the OR port so it's not a security thing, but it
opens the bridge to being blocked).
* Does Tor tunneled over SSH look distinctive? No effort is made to
change the traffic signature, though this can be added if needed.
The tarball contains a more detailed README explaining how to set it up
and how it works. obfsproxyssh_client.c has a more in-depth TODO list
as a large comment near the top of the file.
Comments and feedback will be appreciated.
Regards,
--
Yawning Angel
Anyone knows whether a Nexus 4 baseband processor has r/w
access to system memory? The firmware doesn't seem to be
loaded at boot, so I presume it's entirely out of reach/
reversing?
Hi everyone,
Some clean-ish working code is finally available online [1] (the old PoC
code has been moved to [2]); I'll be adding more soon, but this part does
what it's supposed to do, i.e.:
- archival data import (download, mapping to ORM via Stem, efficiently
avoiding re-import of existing data via Stem's persistence path, etc.);
what's left for this part is a nice and simple rsync+cron setup to be able
to continuously download and import new data (via Metrics archive's
'recent')
- data models and Stem <-> ORM <-> database mapping for descriptors,
consensuses and network statuses contained in consensuses
- models can be easily queried by sqlalchemy's ORM; Karsten suggested
that an additional 'query layer' / internal API is not needed until there's
actual need for it (i.e., my plan was to provide an additional query API
abstracted from ORM (which is itself built on top of database/SQL/python
classes), and to build a backend on top of it, as a neat client of that API
as it were; I had some simple and ugly PoC's that are now pushed out of
priority queue until needed (if ever))
- one example of how this querying (directly atop the ORM) works is
provided: a simple (very partial) Onionoo protocol implementation for
/summary and /details, including ?search, ?limit and ?offset. Querying
takes place over all NetworkStatuses. This is new in the sense that it uses
the ORM directly. If there is a need to formulate SQL queries more
directly, we'll do that as well.
During the Tor developer meetings in Munich, we tried talking over the
existing & proposed parts of the system with Karsten. I will be focusing on
making sure the Onionoo-like backend (which is being extended) is stable
and efficient. I'm still looking into database optimization (with Karsten's
advice); an efficient backend for the majority of all archival data
available would be a great deliverable in itself, and hopefully we can
achieve at least that. I might do well to try and document the database
iterations and development, as a lot of thinking now resides in a kind of
'black box' of DB spec, which does not produce code.
The large Postgres datasets are residing on a server I'm managing; I'm
working on exposing the Onionoo-like API for public queries; doing some
simple higher-level benchmarking (simulating multiple clients requesting
different data at once, etc.) now. I might need to move the datasets to yet
another server (again), but maybe not; it's easy to blame things on limited
CPU/memory resources. :)
Kostas.
[1]: https://github.com/wfn/torsearch
[2]: https://github.com/wfn/torsearch-poc