Using TLS for circuit-level handshake (Was Re: tor-spec comments)

Marc Rennhard rennhard at tik.ee.ethz.ch
Mon Sep 15 09:35:20 UTC 2003


I was on vacation and offline for two weeks and only now saw the whole
TLS discussion. Much of what you discussed sounded quite familiar to me
because we experienced similar problems when developing the Anonymity
Network (AN). I think I can add a bit to the discussion because in
the AN, we used TLS for the nested encryptions between client and all
nodes along the circuit, resulting in several TLS sessions on top of 
each other. We used it in RSA key-exchange mode with RC4 (128-bit key)
and MD5, the latter two mainly for performance reasons.

Our main arguments at that time were that (1) TLS/SSL is well understood,
analysed, and accepted, (2) allows for easy one-way authentication of
the ORs to the client, and (3) that TLS protects itself from replay attacks 
(using own internal sequence numbers), so there is no need to check for
replayed cells.

Having run the system for a while, I can make a few conclusions on the
impact of using TLS:

1. Perfomance is not a problem. Using an Athlon 1GHz and Linux, a node
could process about 20 Mb/s. That is certainly related to using a fast
stream cipher (RC4) and a fast hash function (MD5), but we also used
Java, which should be somewhat slower than C despite JIT compilers.
However, we used longstanding circuits (at least several minutes), so
the overhead from generating new circuits (and computing 2048-bit RSA
operations) is small.

2. Handshaking. We used separate threads to do the handshaking to not
disturb the flow of cells. Since we also used cover traffic, this was
essential to not leak information about when new circuits are being set
up. In the current tor design with no dummies at the current time, I
don't see a problem about doing the TLS handshake within the main 
process. The impact of a few 10 ms seconds delay should not hurt
overall performance (as perceived by the users) too much. It depends 
on the rate of circuit-establishment, of course. If tor ever moves to
some cover traffic scheme, I believe doing the handshake and other
expensive operation in dedicated threads is absolutely mandatory.

3. Multithreading. At the time I implemented the AN, Java did not offer
non-blocking I/O, i.e. there was no poll/select equivalent available.
Consequently, we used separate threads for every socket (actually two,
one for reading and one for writing) and a few others for other stuff.
With many users, we eventually hit the max limit of threads 
(PTHREAD_THREADS_MAX) which was 1024 using our c-lib. That can be
changed, of course, which again is bad for portability. In short, it
was a pain to handle so many threads (especially in terms of debugging
and rarely-occuring deadlocks) and from a software-engineering point of
view, I would stick with the minimum number of threads that is needed.
As long as no cover traffic is employed (see above) and looking at the 
main design of tor, you should use exactly one process/thread.

4. Cell growth. This is a pain as adding a layer of TLS adds 21 bytes
to the cell (5 header plus 16 MD5-hash). So on the way back to the client,
the cell would get longer and longer. To avoid this, the last node in
the AN left room (filled with padding bits) for the additonal overhead to 
come. With 4 nodes, the last would therefore leave open 63 bytes for the 
nodes to come. There are 2 disadvantages here: (1) 21 bytes additional 
overhead per hop is a lot, especially looking at tor-cells with 256 bytes. 
In the AN, we used 1000 bytes, so the impact was not too bad. (2) Every 
node knows its position in the path and the last node also knows the path 
length, both of which is information we don't like to give away. this
can be avoided by leaving, say, 200 bytes open at the last node to
leave enough room for all overheads to follow on the path back to the
client, assuming there is a reasonable upper limit of nodes in a circuit
of about 10. But this again wastes bytes.

> Another requirement is that we'd like the handshaking process to have
> a small externally visible footprint (externally visible to other nodes
> on the circuit, as well as to passive observers). One round-trip seems
> pretty good for that. A several-pass protocol, where each message uses a
> predictable number of cells (more than one), would seem to leave a louder
> footprint. One of the features to putting 'extend' requests inside relay
> cells is that only the last node in the circuit can know that they're
> extend requests. Having a pair of TLS handshakers at each end chatting
> with each other in implementation-specific ways is contradictory to
> this goal.

This multiple round-trips (3 in the case of TLS per OR) is probably a
small problem. Right now, tor is vulnerable to many sophisticated
attacks that this additional leackage of information is probably
minimal. Again, if tor ever uses clever cover traffic mechanisms, then
this multiple round-trips should be hidden from the observer, otherwise
the cover traffic scheme is of not much use.

Summarising, I cannot cive a clear pro/contra using TLS instead of a 
lighter protocol. Its more overhead and has many fatures tor does
not need, but it's also (probably) secure and contains *all* features
you need.

Marc



More information about the tor-dev mailing list