[tor-scaling] Analyzing the Predictive Capability of Tor Metrics

George Kadianakis desnacked at riseup.net
Thu Jul 4 14:08:41 UTC 2019


Mike Perry <mikeperry at torproject.org> writes:

> At Mozilla All Hands, we hoped to find a correlation between the amount
> of load on the Tor network and its historical performance.
>
> Unfortunately, while there did appear to be periods of time where this
> correlation held, we discovered a major historical discontinuity in this
> correlation. We have some guesses that we need to investigate:
> https://lists.torproject.org/pipermail/tor-scaling/2019-July/000053.html
>

You mean the "start of 2015" artifact right? It would be nice to see
some more zoomed-in graphs. Like did the change happen over a single
day? Is the R code for these graphs somewhere online?

I'd like to add "changes to bw auth code, nodes or bandwidth weights" as
another possible guess. e.g. I think that's when maatuska got shut down
according to this graph: https://metrics.torproject.org/totalcw.html?start=2014-12-20&end=2015-03-10

I also tried to check the onion service traffic during those days and I
noticed that we introduced those graphs almost exactly those days. Could
there have been some change in the metrics infrastructure those days?
      https://metrics.torproject.org/hidserv-rend-relayed-cells.html?start=2014-04-05&end=2015-07-04

> So, how can we tell what factors actually really contribute to the
> performance of the Tor network? Let's use statistics.
>
> Let's start of calling Tor performance our dependent variable.
>

By "Tor performance" here you mean "latency" and "throughput" which does
not take into account "reliability". I think as a separate investigation
here it would be interesting to see how the below "independent
variables" impact timeout and failure graphs like this one:
https://metrics.torproject.org/torperf-failures.html?start=2012-04-05&end=2019-07-04&server=public&filesize=50kb

> Based on the brainstorming at Mozilla, and in the meeting on Friday, we
> have a few candidate independent variables that influence performance:
>   1. Total Utilization
>   2. Bottleneck Utilization (Exit or Guard, whichever is scarce)
>   3. Total Capacity
>   4. Exit Capacity
>   5. Load Balancing
>

I think capacity and utilization based metrics are a big part of the
equation here, but they assume that Tor is a perfect byte-pushing
network of pipes. Seeing how these pipes get chosen (load balancing/path
selection) and how well they get used (scheduler and other
implementation details like bugs) also seems important..

The first four variables here seem well defined but what is "Load
balancing"? How do we define this in a way that is robust and rankable?

Perhaps one way could be to play with the utilization concept again but
go per-relay this time, and see how well utilized individual relays are
over time. How do the utilization level differ between slow and fast
relays? What about different relay types?

---

Interesting stuff all around! We indeed have tons of data from our
network over more than a decade. We should learn to put more of those
into good use.


More information about the tor-scaling mailing list