Optimizing tor bandwidth

Tue Mar 18 03:07:37 UTC 2003

On Sat, 2003-03-15 at 19:00, Roger Dingledine wrote:
> I've been mulling over some ways to make our bandwidth use more efficient
> (either for the exit node, the tor network itself, or the user):
> 
> a) Move cell size from 128 to 256. The theory here is that most cells
> are already packed because a given cell is far more likely to be part
> of a bulk transfer, so if we lose 8 bytes to overhead either way, why
> not more-than-double the payload.
>
> b) Remove the unused 4 bytes in the cell header. Let's not do this,
> because it probably doesn't matter much. Especially if we do a).

As Andrei notes, these won't help much: instead of 24 bytes overhead per
256 payload bytes (incl topic header), we'd have 8.  That's *maybe* a 6
percent bandwidth improvement ... assuming that most payload cells are
full.

 [...]
> c) Splice multiple cells into one cell, if their payloads are small.
> This way we can queue up ssh connection cells and put them into a single
> data cell. But I think this is more trouble than it's worth, a) because
> we have to figure out how to queue things but not hold on to them for
> too long, b) because we have to figure out how to splice them together,
> and c) because a given cell is probably already full anyway.

We should add some on-server stats so we can actually quantify how much
this happens.  This logic is hard to build; there are 12,900 hits for
"Nagle algorithm" on google, many of which discuss ways to discuss when
you should disable it.  Let's not rush in where TCP gurus fear to tread.

> d) Let's put one side through zlib compression, and the other side
> through zlib decompression. That way we can handle more traffic over
> the tor backbones, and also we deliver more over a given user's pipe.

Pretty much done, and a clear win ... if most data is large and
compressible anyway.  I suspect that this isn't the case; more bandwidth
probably goes to downloading (compressed) images, music, and executables
than goes to downloading (compressible) HTML and friends.  But hey,
let's see.

> e) Rather than having a separate sendme cell with a useless payload,
> should we have sendme cells *be* data cells, just with a different
> command? 

Again, we only save 1 cell out of 100; and only in cases where the
receiver is speaking at least 1% as much as the sender ... which for
high-bandwidth HTTP is not the case. 

> f) Run squid on the exit node, to help prevent traffic analysis, and to
> help reduce bandwidth load on that node.

Sounds like a good idea in any case.

> What do you think about the above? What else could we do?

Add stats and benchmarking to the server.  I want to see how full cells
are, how well compression works, how often buffers are empty, how much
latencies hiccup, and so on.

-- 
Nick Mathewson <nickm at alum.mit.edu>