[tor-talk] Tentative results of analysis of data on metrics.torproject.org

Virgil Griffith i at virgil.gr
Thu Sep 4 19:02:27 UTC 2014


> So I guess the question is: what conclusions should we draw from Figure
3? If it levels off or goes down in the future, does that indicate a bad
thing in any way?

Given that the average bandwidth across the world is increasing, it would
be quite concerning if the average relay bandwidth for the entire Tor
network plateau'ed or went down.


> and it better shows the growth in capacity of the network, in a way that
wouldn't be dragged down by adding a whole bunch of fast relays (which
people use) but also adding a whole bunch of slow ones (which people don't
use).

Superb point.  I guess what you really want is an weighted average in which
the weight is the probability of that relay being selected.  What to use
for the probability of selection?  Consensus weight?  Suggestions here?



> From your transition text between Figure 3 and Figure 4 it looks
like you're trying to use bandwidth­per­relay as a stand­in for
expected­bandwidth­for­the­client, which I think it isn't because clients
don't select relays uniformly.

My Figure 3 was just exploring "obvious stuff" to get a feel for the data.
 My take on expected-bandwidth-for-the-client is closer to Figure 6a.
 However, a weighted version of Figure 3 would be a definite improvement!
 Tell me what to use for the weighting and I'll use it.


> Also, it would appear that this Nielsen's law is about bandwidth that
home users get in one direction. I think that's probably quite different
from bandwidth that hosted servers get in both directions.  Does Nielsen's
law say anything about upstream bandwidth? Did you bring it up just because
they both said 24 months, or am I missing the tie­in?

I am not aware of any formal upload analogue of Nielsen's Law.  And yes, I
mentioned Nielsen's Law just because they both said 24 months.  For what
it's worth the OOKLA data does have a separate field for upload bandwidth
which actually scales pretty close to Nielsen's Law.  How can this be
improved?  Is there a plot with upload data that you'd like to see?  Should
the references to Nielsen's Law be dropped?


> ­ When you say "Torperf speeds double" I get confused, because Torperf
primarily measures in seconds­to­fulfill­the­request,

Yeah that is confusing---I always converted the
seconds-to-fufill-the-request to KiB/s.  This made it easier to compare.  I
also adjusted the terminology by replacing several instances of "speed"
with "throughput".


> (Several times you say 500 KiB when I think you mean 50 KiB?)

Correct.  My mistake.  Fixed!  Thanks for pointing that out.

> have to wait for another round­trip! So I think one conclusion is that
our Torperf results for 1MB and 5MB are non­linear in quite subtle ways.)

Subtle non-linearities based on window sizes wouldn't surprise me.  All I
got that might be useful to you is that the 1MiB and the 5MiB cases are
virtually synonymous.  Figure 4 is representative of this.  The 1MiB and
5MiB cases were so similar that I don't bother showing more data from the
1MiB case.


> I wonder if there's a way, given the 50KB / 1MB / 5MB data, to separate
out these two components?

I suspect there's something clever you could do to separate out these
components, but it requires more knowledge of the data-cell-structure than
I currently possess.  If there's a spec clearly detailing the various
overhead I can try my hand at computing it for the different filesizes.
 That said, even though Tor performance is handicapped by round-trip
latency, what matters is the goodput.  Maybe I'm missing something, but
this still seems a reasonable comparison of goodput even if Tor is
handicapped.


> For your Figure 6a, there's a challenge here because the "number
of clients" graphs are the number of Tor clients running, not
the number of Tor clients clicking on something at once.

Sure.  The ordinate (y-axis) on Figure 6a is certainly skewed.  I hadn't
articulated it to myself before, but I make an implicit assumption that the
"active" number of clients, over the day, is some fraction of the total
number of clients, and this fraction is constant from day to day.  So
although the absolute numbers would be the different, the shape of the data
would be the same.  Do you dispute this?


-V

P.S. Completely unrelated:  I put down Tor Project as a candidate hosting
organization on a ICFP proposal to further develop tor2web.  I understand
Tor's desire to not host tor2web, but I thought if you could support this
project being done (even if you're not the hosting organization), I would
be much obliged.


More information about the tor-talk mailing list