[tor-relays] max / burst speed

Wed Sep 28 03:43:22 UTC 2011

On 9/27/2011 1:37 PM, "Steve Snyder" <swsnyder at snydernet.net> wrote:

> Either there is simply not enough traffic to saturate all available middle nodes or Tor's node selection algorithm is, um, sub-optimal.

I just started my relay a month ago, so I've done some research, and it 
seems to be pretty complicated.  Please excuse what turned into an 
over-long post, but I think this is info that new relay operators don't 
generally know, and should.

At its base, the traffic allocation scheme is elegantly simple: each 
client randomly selects circuit participants based on the sum of all the 
relays' bandwidth values from the most recent consensus (not counting 
complications like starting with a guard node, ending with an exit, 
etc.).  So ideally, a relay with twice the bandwidth of another will (on 
average) be selected for circuits twice as often, and (again, on 
average) end up processing twice as much traffic.  And also, if the 
total bandwidth demand from all clients is enough to consume, say, 60% 
of all Tor relays' available bandwidth, each relay will (on average) be 
kept operating at about 60% of capacity.

But... the bandwidth figure in the consensus that clients use to select 
relays is not a simple figure.  Apparently, it used to be based on the 
rates reported by each relay in their uploaded server descriptors, but 
people tried to game the system and fed it bogus data, so that was 
replaced by the authoritative servers going out and measuring the actual 
bandwidth of each relay by downloading a test file through it to see how 
fast it really went.  So the values your relay uploads in the descriptor 
are ignored now, and your set Rate and Burst speeds aren't relevant for 
traffic allocation except in as much as they affect the speed that the 
official bandwidth scanner can download the test file through you.  The 
only bandwidth figure that matters for driving traffic to your relay is 
the one reported in the consensus, which you can see at 
https://metrics.torproject.org/networkstatus.html, or from your own 
relay's DirPort at 
http://<relay>:<dirport>/tor/status-vote/current/consensus if you're a 
directory mirror.  Note that these are not the values from any of the 
TorStatus servers, they're displaying something else entirely (and not 
always the same thing as each other, either).  So anyway, don't expect a 
change in your torrc file to immediately bring you more traffic.  Your 
relay has to get rescanned first, which probably only happens once a day.

OK, fair enough.  But those bandwidth numbers aren't just the simple 
speed which was seen when the scanner last downloaded a test file 
through your relay.  To smooth out random speed changes as your relay is 
remeasured day to day, the bandwidth scanners use some kind of fairly 
slow exponential moving average of your download speeds.  So it takes 
considerable time for changes in your relay's speed to slowly seep into 
the bandwidth number in the consensus.  And for new relays, it seems 
that the initial value which the new measurements are slowly averaged 
into starts out pretty low, so that your reported bandwidth also starts 
out pretty low, and then gradually rises over time to somewhere around 
your actual Rate limit.  And by slow, I mean weeks; my new relay has 
been running for a month, and over that time, I've seen the reported 
bandwidth slowly, with many fits and starts and temporary setbacks, go 
from about 20-30 to ~200 (about half my actual Rate limit), and it's 
still rising.  Which kind of has the effect of putting new relays on 
probation, and slowly feeding them more and more traffic over time to 
see how they do, which is not a bad thing at all.  But new relay 
operators are usually excited and anxious to see stuff happen, and need 
to be aware of this slow starting ramp up period and not get too 
discouraged or give up because they're not seeing much traffic at first.

OK, so the bandwidth rating that matters is measured and slowly 
averaged... but there's yet another layer.  To improve the overall 
performance of the Tor network, and to help clients generally create 
faster circuits, there's another bias factor thrown in.  I don't know 
the details, but faster relays have their measured bandwidth figures 
artificially boosted to drive even more traffic through them than their 
high bandwidth would naturally attract, and/or slower relays have their 
bandwidth figures artificially lowered to drive less traffic through 
them (not sure which or both, but the effect is the same regardless).  
So if the overall client bandwidth demand is, again, 60% of the total 
Tor network bandwidth available, instead of each relay being at ~60% 
capacity, the fastest relays will be more fully utilized, and, 
unavoidably, that means that the slower relays will be correspondingly 
less utilized.

This might dismay slower relay operators who feel that they're being 
prevented from contributing as much as they'd like, but objectively, 
it's generally better for a Tor client to have a 300 KB/s circuit hop 
than a 30 KB/s one.  The faster relays are just nicer for clients to 
use, and it's better overall for the Tor network to make sure they get 
used as much as possible.  And if those fast relays are getting more 
than their prorated "fair" share of usage based on their actual speeds, 
that unavoidably means that slower relays are getting less usage than 
their speed would normally merit.  But that doesn't mean the slower 
relays are useless!  Simply by existing, those extra relays greatly 
increase the difficulty of various attacks on Tor, just because they 
*might* have been used for any given circuit.  Also, the whole guard 
node system for making certain nasty attacks infeasible relies on having 
lots of potential guard nodes to choose between, even relatively slow 
ones.  And of course, all exit nodes are especially precious, almost 
regardless of speed.  And finally, even if they're not used all that 
much while client demand on the Tor network is low to moderate, they 
provide an important spare reserve of bandwidth to make sure that some 
relay somewhere will always be ready to handle a new circuit even if the 
network becomes very busy and manages to max out the high speed 
"backbone" relays, or Tor is subjected to some kind of DOS attack.

So anyway, for all the relay operators asking "why isn't my relay being 
used more?", there's your infodump.  If it's a new relay, or you 
recently upgraded/raised your speed limits, keep an eye on the official 
bandwidth figure for your relay in the consensus, especially the nice 
graph displayed in the Router Detail page that you reach by clicking 
your router link on the 
https://metrics.torproject.org/networkstatus.html page.  If that graph 
shows any upwards trend, the effects of your change are still slowly 
percolating into your official bandwidth figure, and more traffic will 
appear as it rises.  If it's plateaued out, you're getting your share of 
the overall Tor traffic based on your relay's overall performance and 
the total client demand on the Tor network.

For slow nodes who've limited their overall Rate to avoid hitting 
bandwidth caps, you might consider using AccountingMax to cap the usage 
to a safe level, and increase your speeds; you may find it more 
rewarding to relay significant traffic for 6 hours per day and then 
hibernate for 18 than to stay on "inactive reserve" status all the 
time.  From the overall viewpoint of the network, is it better to have 
1000 new relays at good speeds up 1/4 of the time (effectively adding 
250 new fast relays), or to have them at slow speeds all of the time, 
not being used much?  I'm not really sure, but I've noticed that the 
AccountingMax hibernation feature is hardly used at all from what I see 
on TorStatus, and I wonder why.

OK, enough already, this turned out way longer than I was expecting.  
Hope it helps.