On 9/27/2011 1:37 PM, "Steve Snyder" swsnyder@snydernet.net wrote:
Either there is simply not enough traffic to saturate all available middle nodes or Tor's node selection algorithm is, um, sub-optimal.
I just started my relay a month ago, so I've done some research, and it seems to be pretty complicated. Please excuse what turned into an over-long post, but I think this is info that new relay operators don't generally know, and should.
At its base, the traffic allocation scheme is elegantly simple: each client randomly selects circuit participants based on the sum of all the relays' bandwidth values from the most recent consensus (not counting complications like starting with a guard node, ending with an exit, etc.). So ideally, a relay with twice the bandwidth of another will (on average) be selected for circuits twice as often, and (again, on average) end up processing twice as much traffic. And also, if the total bandwidth demand from all clients is enough to consume, say, 60% of all Tor relays' available bandwidth, each relay will (on average) be kept operating at about 60% of capacity.
But... the bandwidth figure in the consensus that clients use to select relays is not a simple figure. Apparently, it used to be based on the rates reported by each relay in their uploaded server descriptors, but people tried to game the system and fed it bogus data, so that was replaced by the authoritative servers going out and measuring the actual bandwidth of each relay by downloading a test file through it to see how fast it really went. So the values your relay uploads in the descriptor are ignored now, and your set Rate and Burst speeds aren't relevant for traffic allocation except in as much as they affect the speed that the official bandwidth scanner can download the test file through you. The only bandwidth figure that matters for driving traffic to your relay is the one reported in the consensus, which you can see at https://metrics.torproject.org/networkstatus.html, or from your own relay's DirPort at http://<relay>:<dirport>/tor/status-vote/current/consensus if you're a directory mirror. Note that these are not the values from any of the TorStatus servers, they're displaying something else entirely (and not always the same thing as each other, either). So anyway, don't expect a change in your torrc file to immediately bring you more traffic. Your relay has to get rescanned first, which probably only happens once a day.
OK, fair enough. But those bandwidth numbers aren't just the simple speed which was seen when the scanner last downloaded a test file through your relay. To smooth out random speed changes as your relay is remeasured day to day, the bandwidth scanners use some kind of fairly slow exponential moving average of your download speeds. So it takes considerable time for changes in your relay's speed to slowly seep into the bandwidth number in the consensus. And for new relays, it seems that the initial value which the new measurements are slowly averaged into starts out pretty low, so that your reported bandwidth also starts out pretty low, and then gradually rises over time to somewhere around your actual Rate limit. And by slow, I mean weeks; my new relay has been running for a month, and over that time, I've seen the reported bandwidth slowly, with many fits and starts and temporary setbacks, go from about 20-30 to ~200 (about half my actual Rate limit), and it's still rising. Which kind of has the effect of putting new relays on probation, and slowly feeding them more and more traffic over time to see how they do, which is not a bad thing at all. But new relay operators are usually excited and anxious to see stuff happen, and need to be aware of this slow starting ramp up period and not get too discouraged or give up because they're not seeing much traffic at first.
OK, so the bandwidth rating that matters is measured and slowly averaged... but there's yet another layer. To improve the overall performance of the Tor network, and to help clients generally create faster circuits, there's another bias factor thrown in. I don't know the details, but faster relays have their measured bandwidth figures artificially boosted to drive even more traffic through them than their high bandwidth would naturally attract, and/or slower relays have their bandwidth figures artificially lowered to drive less traffic through them (not sure which or both, but the effect is the same regardless). So if the overall client bandwidth demand is, again, 60% of the total Tor network bandwidth available, instead of each relay being at ~60% capacity, the fastest relays will be more fully utilized, and, unavoidably, that means that the slower relays will be correspondingly less utilized.
This might dismay slower relay operators who feel that they're being prevented from contributing as much as they'd like, but objectively, it's generally better for a Tor client to have a 300 KB/s circuit hop than a 30 KB/s one. The faster relays are just nicer for clients to use, and it's better overall for the Tor network to make sure they get used as much as possible. And if those fast relays are getting more than their prorated "fair" share of usage based on their actual speeds, that unavoidably means that slower relays are getting less usage than their speed would normally merit. But that doesn't mean the slower relays are useless! Simply by existing, those extra relays greatly increase the difficulty of various attacks on Tor, just because they *might* have been used for any given circuit. Also, the whole guard node system for making certain nasty attacks infeasible relies on having lots of potential guard nodes to choose between, even relatively slow ones. And of course, all exit nodes are especially precious, almost regardless of speed. And finally, even if they're not used all that much while client demand on the Tor network is low to moderate, they provide an important spare reserve of bandwidth to make sure that some relay somewhere will always be ready to handle a new circuit even if the network becomes very busy and manages to max out the high speed "backbone" relays, or Tor is subjected to some kind of DOS attack.
So anyway, for all the relay operators asking "why isn't my relay being used more?", there's your infodump. If it's a new relay, or you recently upgraded/raised your speed limits, keep an eye on the official bandwidth figure for your relay in the consensus, especially the nice graph displayed in the Router Detail page that you reach by clicking your router link on the https://metrics.torproject.org/networkstatus.html page. If that graph shows any upwards trend, the effects of your change are still slowly percolating into your official bandwidth figure, and more traffic will appear as it rises. If it's plateaued out, you're getting your share of the overall Tor traffic based on your relay's overall performance and the total client demand on the Tor network.
For slow nodes who've limited their overall Rate to avoid hitting bandwidth caps, you might consider using AccountingMax to cap the usage to a safe level, and increase your speeds; you may find it more rewarding to relay significant traffic for 6 hours per day and then hibernate for 18 than to stay on "inactive reserve" status all the time. From the overall viewpoint of the network, is it better to have 1000 new relays at good speeds up 1/4 of the time (effectively adding 250 new fast relays), or to have them at slow speeds all of the time, not being used much? I'm not really sure, but I've noticed that the AccountingMax hibernation feature is hardly used at all from what I see on TorStatus, and I wonder why.
OK, enough already, this turned out way longer than I was expecting. Hope it helps.