Exit Balancing Patch

Mike Perry mikeperry at fscked.org
Thu Jul 26 18:51:00 UTC 2007

Thus spake Robert Hogan (robert at roberthogan.net):

> On Thursday 26 July 2007 11:57:44 Mike Perry wrote:

> You don't say so in your post, but I take it the percentile ranges were taken 
> in descending order of advertised bw?

Correct. There is a graph that illustrates the distribution of
bandwidth by percentile:


> So if I understand this correctly:
> * Overall network bw is poorly utilized because high-capacity nodes aren't 
> participating in the large slice of the network's circuits that their bw 
> warrants.


> * Tor needs to be careful (and clip at some sane bw of X mb/s) because 
> otherwise too much traffic will concentrate on a small group of nodes, 
> harming anonymity.

Not to offend you, but I believe this argument is nonsense. A high
capacity node should be fully utilized to the full extent of its
capacity. Artifically clipping capacity does not properly solve *any*
problems related to nodes lying about capacity, nor does it "improve
anonymity". If you try to "protect" anonymity this way, people with
high capacity links will just start mounting sybil attacks instead.

In fact, such uniform balancing actually is extremely damaging to
anonymity, because it both artifically hampers user size, and causes
exorbitiant amounts of circuit failures which can be used to bias
route selection (the last guard percentile, the 45-50 range, exhibited
nearly 40% circuit failure rate!).

> Given that a tor circuit is only as fast as its slowest node isn't it just as 
> likely that for some reasonable percentage of the time, the user's circuit's 
> will still be problematically slow (e.g. fast guard, slow relay, fast exit)?

That is what load balancing is for though. If you notice, that is why
I did all my userbase size calculations based on a 10KB/sec stream
rate, which is the average speed of the rest of the Tor network. As I
pointed out, the top 5% has room for SEVEN TIMES MORE 10KB/sec
streams, since its average stream capacity is 70K/sec.

Assuming fairness, a node with capacity C and N streams should produce
stream capacity of C/N. It is simple algebra to see that if C/N=70,
you can have N'=7N users at 10K/sec instead.

As my calculations (which assume TCP fairnes, a property that may not
hold over internal circuits) showed, I believe that properly balancing
the network should increase effective user capacity by 4X.

(X is current network size: X*.45*(70/10)+X*.15*(30/10)+X*.25=4.3X)

If it turns out this is the case because of fairness or other issues,
we can take more measurements once the changes are in place and go to
work devising a better weighting scheme. Right now, however, load
balancing has such fundamental issues as these that it is impossible
to make any other adjustments or measurements other than to say "Oh
dear goddess is it horribly broken".

Incidentally, it is ALSO currently extremely difficult to detect nodes
that are currently maliciously failing circuits, or who are lying
about their bandwidth, because the network is so unbalanced. So saying
that we shoud "wait until we can detect liars" until we fix this stuff
will leave us in circular deadlock.

> If that's true then increasing the clipping point alone will not improve 
> overall performance, so in order to work does your proposal have to be 
> two-pronged? 
> ( 1 ) adopt two-hop paths as a user option, because those will always be 
> mostly, but not always, be fast if ( 2 ) high-capacity nodes get their 'fair' 
> share of the user's circuits once punitive clipping is removed. This is 
> because guards have a guaranteed high-ish bw and we've reduced our odds of 
> getting a slow node on the rest of the circuit because we're only choosing 
> one more, rather than two more AND we've allowed fast nodes to throw their 
> weight around a bit more in the random selection.

No, two hop paths is independent of this proposal. But I'm not giving
up on that one either. I believe 2 hop paths to be purely an
implementation issue. There is no theoretical reason why 2 hop paths
significantly degrades anonymity, but I do agree the devil is in the
implementation details.

> Is there more to it or am I missing something?
> As a side note:
> Wouldn't a useful metric be nodes' cpu usage or even capacity (e.g. make and 
> ghz)?  Is there any research or intuition on the relationship between cpu 
> make/ghz'age and bw-capacity that illustrate how much of that bw is just 
> notional once the node is participating in x circuits with y kb/s? Would it 
> be worth including this in descriptors?

No, I think this is up to node operators to look at their CPU usage
and limit tor's bandwidth proportionally to achieve their desired
load. CPUs vary widely and can have various optimizations as well (ssl
accelerator cards, vector operations, etc). 

If nodes properly report their current capacity, it doesn't matter if
a node is network bound or CPU bound from a balancing perspective
(though perhaps Johannes will discover that CPU load possibly
increases latency in some weird ways).

Mike Perry
Mad Computer Scientist
fscked.org evil labs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20070726/c3acb921/attachment.pgp>

More information about the tor-dev mailing list