[tor-scaling] Measuring the Accuracy of Tor Relays' Advertised Bandwidths

Thu Sep 12 20:04:36 UTC 2019

Hi Mike,

Sorry for the silence! There are only so many hours in a day, and I did not start looking at the data from this experiment until literally today...

> On Aug 13, 2019, at 6:33 PM, Mike Perry <mikeperry at torproject.org> wrote:
> 
> Rob: it would be useful if you could describe the analysis you plan to
> do on this data on this list, in particular: can you refer to the
> performance metrics at
> https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics
> and use those?
> 

Well, none of those metrics capture what I am primarily interested in: the amount of error present in the consensus weights, and how inaccurate observed bandwidths contribute to that error. (more below)

> FYI: The kanban entry for this experiment has notes about using metrics
> for "Capacity", "Load Balancing", and "Reliability". I see those as the
> first-order effects of this experiment. We're also going to want to
> compare the effects on the "Throughput" end-user metric, and maybe even
> "Latency" too, so maybe the answer is just "All" at this point...
> 

If my experiment caused no significant change in the consensus weights on the relays that I measured a few days after the measurement (i.e., the consensus weights did not change any more than they normally do), then second-order effects like throughput and latency are meaningless. So, looking at the weights is the first and most important step.

> However, if any notable metrics are missing, or if any existing ones
> need to be changed, we should discuss that and update that wiki page,
> which is meant to be canonical for all performance and scalability
> improvements going forward.

One is the total variation distance [0] between the normalized capacity C (which we have a better estimate for (observed bandwidth) after my experiment) and the normalized consensus weight W:

D = 1/2 sum_{r \in R} | W_r - C_r |

This tells us how far the consensus weights are off from an 'ideal' capacity weighting; it ignores things like CPU overload, socket limits, etc., but I think it is still useful.

> Furthermore, I want to remind everyone that we have established a set of
> controls and processes for doing performance experiments on the live
> network at
> https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceExperiments.
> 
> Rob: you did a good job of following the general idea here with what you
> sent to tor-relays (forwarded below), but that page also calls for doing
> experiments like this on an on/off time schedule, so that we can compare
> data from different time periods to control for/rule out unrelated
> trends. It sounds like you may be planning to do this after you go over
> your data and check for collect collection, etc, but I just wanted to
> verify that, and see if we can align the timing of some of your future
> tests with the sbws vs Torflow comparison, and get a two-for-one?
> 

Sorry, I had only planned to do the experiment once. Doing it multiple times should give us more confidence, but I don't think it's as crucial to do so in this case as it is when we are trying to measure performance effects. The reason is that my experiment is just meant to push relays into reporting a better observed bandwidth (i.e., capacity estimate), and I don't really care if they reported higher observed bandwidths because of my traffic or because of some confounding reason.

PLP,
Rob

[0] https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures