[tor-scaling] Analyzing the Predictive Capability of Tor Metrics

Mike Perry mikeperry at torproject.org
Tue Jul 2 22:35:00 UTC 2019


At Mozilla All Hands, we hoped to find a correlation between the amount
of load on the Tor network and its historical performance.

Unfortunately, while there did appear to be periods of time where this
correlation held, we discovered a major historical discontinuity in this
correlation. We have some guesses that we need to investigate:
https://lists.torproject.org/pipermail/tor-scaling/2019-July/000053.html

For purposes of the discussion below, let's set aside one-off causes of
the discontinuity (things like the siv torperf's ISP changing, torperf
upgrades, and consensus parameters), and instead focus on candidate
independent variables that influence the time periods outside of the
discontinuity (and may also influence the discontinuity itself).

So, how can we tell what factors actually really contribute to the
performance of the Tor network? Let's use statistics.

Let's start of calling Tor performance our dependent variable.

Based on the brainstorming at Mozilla, and in the meeting on Friday, we
have a few candidate independent variables that influence performance:
  1. Total Utilization
  2. Bottleneck Utilization (Exit or Guard, whichever is scarce)
  3. Total Capacity
  4. Exit Capacity
  5. Load Balancing


Now, note that our performance metrics (the dependent variable) are all
rank-comparable. We might need a human in the loop to account for the
desired use cases/edge cases (ie lower latency is often more important
than insanely high throughput, etc), but we can monotonically rank our
historically performance results from better to worse, at whatever
timescales we choose. In particular, we can look at a set of CDFs or
boxplots of latency and throughput results, and we can say which pairs
of latency and throughput are better for our users than others, using
the "good" and "bad" CDF heuristics from our metrics page:
https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics#LatencyMetrics
https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics#ThroughputMetrics

Moreover, our independent variables can *also* be ranked in monotonic
order. It is possible to rank plots of Utilization, Capacity, Relay
Spare Capacity, and Relay Stream Capacity from "better" to "worse":
https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics#CapacityMetrics
https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics#BalancingMetrics

Once we have a monotonic rank-ordering on our dependent variable data
points, and monotonic rank-orderings on our candidate independent
variables' data points, we can use a statistical correlation coefficient
to determine which network-level independent variables correlate best to
the rank-ordered performance data. There are two statistical methods for
determining correlation in monotonically rank-ordered data:
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient

Note that both methods only require monotonic ordering. It is possible
for us to declare "ties" or simply "shrug" when trying to decide if one
latency/throughput combination is better than another.

Of the two, Kendall's Tau is less sensitive to distances in relative
ranking, and instead more directly measures the property that "if the
independent variable went up/down, did the dependent variable go up/down
at the same time?"

So, after we have done this ranking (probably manually), we compute
Kendall's Tau for the correlation of our five independent variables, and
see which of "Total Utilization", "Bottleneck Utilization", "Total
Capacity", "Total Capacity", or "Load Balancing" best correlates to
overall Tor performance in general, and for specific periods of time
(such as across the discontinuity and during botnet and DoS attacks).

We can also investigate specific time periods where this correlation
doesn't hold, and see what other variables are involved during those
periods.


Once this work is done, and we have a good idea what factors are most
strongly correlated with historical Tor performance, we can start
running shadow models where we vary these factors, and determine the
smallest shadow model that is still able to show this relationship and
its effects on performance metrics. This model will be our "smallest"
baseline simulator, which we can also use as we conduct further
performance tuning experiments.



-- 
Mike Perry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-scaling/attachments/20190702/11e6a128/attachment.sig>


More information about the tor-scaling mailing list