[or-talk] where are the exit nodes gone?

Scott Bennett bennett at cs.niu.edu
Sun Apr 11 11:12:43 UTC 2010

Hi Olaf,
     On Sun, 11 Apr 2010 12:11:36 +0200 Olaf Selke <olaf.selke at blutmagie.de>
>Scott Bennett schrieb:
>>      Observed by what?  If it has anything to do with the numbers
>>  given in the consensus documents, then the only value such graphs
>> would have would be for the purpose of comparing those graphs with the
>> values reported by the relays themselves.  The values in the consensus
>> documents alone are, a priori, worthless.
>yes, the max and the burst bandwidth are not so much worth for statistic

     You did say "observed", not "advertised".

>purposes. As I mentioned some says ago, "MaxAdvertisedBandwidth 2500 KB"
>config option leads to an real average bandwidth (measured by mrtg) of
>about 16000 KB on blutmagie exit. A higher MaxAdvertisedBandwidth value

     Remember, there are exactly two vantage points from which valid
observations can be made, no more and no less.  One is from inside your
system's networking stack (including packet filter software).  The other
is inside your tor relay's process.  Unless the value of "about 16000 KB"
(/s) comes from one of those two sources, then I simply don't believe the
so-called measurement, and neither should you.  Such a measurement means,
at best, only that "it's probably a relatively big number when compared
to the rest of such numbers in the consensus, and the real number is
almost certainly larger than this number".

>is killing the cpu with the number of new conns/s.
>Is it possible to use the average observed bandwidth reported by the
>relays? Knowing the number of exit relays doesn't help very much without

     No, not at the present time because that is not reported by the relays.
What a relay reports is the highest minimum number of bytes handled in any
one second in a ten-second sliding window within the the past 24 hours.
That value is then devalued considerably by the fact that the 24-hour
periods are not normally consecutive, but rather are overlapped by roughly
six hours at each end, so that only the middle twelve of the 24 hours are
represented exclusively in a measurement reporting period.
     The whole reporting setup is wrong and needs to be revamped from
scratch in order to get a system that works properly.  As I've noted before,
the very first and most critical thing to be done is the design separation
of throughput capacity (which the clients need to know) from actual service
rendered (which only some humans want to know).  The rest cannot even be
begun until that much is done.

>knowing about the total provided bandwidth.
     Probably the best data (i.e., not as bad as any of the other values
reported) for that purpose would be found in the extra-info documents.
Divide each field by 900 s to get the average rates in B/s.  One good
thing about the numbers in the extra-info documents is that both "bytes
read" and "bytes written" are reported.
     Sorry to disappoint you, Olaf, but that's just the way things are
for now. :-(
     FWIW, I still think it might be worth your time to take a spare
machine, if you have one, and install an OS that supports superpages
(e.g., FreeBSD 7.2 and later, Windows Vista and later, possibly Windows
Server 2008, but I don't know about that one), and then try it long
enough to see whether that relieves any of the CPU load.  Or, if you're
up for some coding and testing, you could try LINUX's support for "huge"
pages, but that facility is neither automatic nor transparent to the
application, as I understand it, so it does require additional code.  At
present, it's very likely that 30% to 45% of your tor relay's CPU time
is being wasted in address translation due to TLB misses, even when the
needed data or instructions are *already in some level of cache*.  If
the CPU is stalled because MMU has to walk a page table before it can
discover that what it needs is not only already in memory, but already
in a cache, the performance hit is a crying shame.

