[tor-scaling] Fwd: Measuring the Accuracy of Tor Relays' Advertised Bandwidths

Tue Aug 13 22:33:00 UTC 2019

Rob just completed his 20 second "relay stress test" network experiment.
The TL;DR is that this experiment should change the load balancing
weights of relays, by way of boosting their "observed bandwidth" relay
descriptor fields to more realistic values. These fields are inputs to
the TorFlow/sbws load balancing system.

This experiment should have observable effects on load balancing, as
well as measurable effects on onionperf performance and user experience,
for the duration of time that the updated descriptor values are in use
by torflow and/or sbws.

Rob: it would be useful if you could describe the analysis you plan to
do on this data on this list, in particular: can you refer to the
performance metrics at
https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics
and use those?

FYI: The kanban entry for this experiment has notes about using metrics
for "Capacity", "Load Balancing", and "Reliability". I see those as the
first-order effects of this experiment. We're also going to want to
compare the effects on the "Throughput" end-user metric, and maybe even
"Latency" too, so maybe the answer is just "All" at this point...

However, if any notable metrics are missing, or if any existing ones
need to be changed, we should discuss that and update that wiki page,
which is meant to be canonical for all performance and scalability
improvements going forward.

This is important because the sbws and the torflow load balancing
systems will also need to be compared in October/November, as we switch
to sbws (sbws still has critical  bugs that are being fixed now, hence
the delay). We may even want to run this "stress test" experiment again
then, since the sbws is not ready for such a comparison at the moment
(though technically, we are already storing raw consensus votes with
sbws values in the metrics archive already).

Furthermore, I want to remind everyone that we have established a set of
controls and processes for doing performance experiments on the live
network at
https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceExperiments.

Rob: you did a good job of following the general idea here with what you
sent to tor-relays (forwarded below), but that page also calls for doing
experiments like this on an on/off time schedule, so that we can compare
data from different time periods to control for/rule out unrelated
trends. It sounds like you may be planning to do this after you go over
your data and check for collect collection, etc, but I just wanted to
verify that, and see if we can align the timing of some of your future
tests with the sbws vs Torflow comparison, and get a two-for-one?

It is important that we all maintain institutional memory for stuff like
this. Technically, maintaining and enforcing this institutional memory
is my job, but I am currently full time on both wrapping up circuit
padding for Sponsor 2, and working on the Firefox ESR migration, so I
will be missing stuff like this routinely until October at the earliest.

This also means that we will not have any scalability meetings until
October. For now, let's say that we will meet again on Thursday Oct 31
at 16:00 UTC.

Until then, things will be smoother if we can all work together and use
this mailinglist to help each other self-organize, especially since I
will be mostly too busy to make sure I catch things like this in the
near-term future.

-------- Forwarded Message --------
Subject: [tor-relays] Measuring the Accuracy of Tor Relays' Advertised
Bandwidths
Date: Fri, 26 Jul 2019 10:18:24 -0400
From: Rob Jansen <rob.g.jansen at nrl.navy.mil>
Reply-To: tor-relays at lists.torproject.org
To: tor-relays at lists.torproject.org

---
Measuring the Accuracy of Tor Relays' Advertised Bandwidths

Motivation
----------
The capacity of Tor relays (maximum available goodput) is an important
metric. Combined with mean goodput, it allows us to compute the
bandwidth utilization of individual relays as well as the entire network
in aggregate. Generally, capacity is used to help balance client load
across relays, and relay utilization rates help Tor make informed
decisions about how to allocate resources and prioritize performance and
scalability improvements.

Problem
-------
Currently, Tor uses a heuristic measure of unknown accuracy to estimate
Tor relay capacity. Each relay keeps track of the maximum goodput it has
achieved over any 10 second window in a 24 hour period. This is called
the "observed bandwidth". Relays take the minimum of their "observed
bandwidth" and their bandwidth rate-limiting configuration and reports
the result as the "advertised bandwidth" in their server descriptors. We
do not know how well the advertised bandwidth estimates the true relay
capacity, but we do know that it represents a lower bound on capacity.

Hypothesis
----------
The advertised bandwidth significantly underestimates the true capacity
of Tor relays. On average, relays with higher true capacities will be
more strongly correlated with capacity underestimation (because it will
be less likely that fast relays will have sustained their full capacity
over a 10 second period).

Experiment
----------
A relay reports its advertised bandwidth in its server descriptor. To
test how well these reported numbers represent the true capacity of a
relay, we can manually perform a speed test on the relay by initiating
the simultaneous download of several large data streams for a period
that exceeds 10 seconds. In the report following our test, the relay
will report its advertised bandwidth in its server descriptor and the
results will be collected and reported by metrics.torproject.org.

The experiment involves two steps: running the speed test on a relay
under our control, and running the speed test on all relays in Tor network.

We will first run the speed test on at least one relay that we control,
in order to test that the method is effective and that we can in fact
observe a change in the advertised bandwidth reported on
metrics.torproject.org. Once we have confidence that our speed test is
functioning correctly, and that the metrics pipeline will allow us to
gather the results, we will repeat it on all relays in the network.

We will conduct the speed tests while minimizing network overhead. We
will use a custom client that builds 2-relay circuits. The first relay
will be the target relay we are speed testing, and the second relay will
be a fast exit relay that we control. We will initiate data streams
between a speedtest client and server running on the same machine as our
exit relay.

The setup will look like:

speedtest-client <--> tor-client <--> target-relay <--> exit-relay <-->
speedtest-server

All components will run on the same machine that we control except for
the target-relay, which will rotate as we test different relays in the
network. For each target relay, we plan to run the speedtest for 20
seconds in order to increase the probability that the 10 second mean
goodput will reach the true capacity. We will measure each relay over a
few days to ensure that our speedtest effects are reported by every relay.

Although we believe that the overhead of this speed test is in line with
regular usage, relay operators can opt-out of the speed test by replying
on this thread. Those that opt out will be removed from our list of
relays to scan.

Analysis
--------
Following our speedtest, we will analyze the data collected and reported
by Tor metrics. We will compared the advertised bandwidth that each
relay reports before our experiment to those reported during our
experiment. This will help us test our hypothesis that relays'
advertised bandwidth underestimates the true capacity of relays. We will
run a statistical correlation analysis on the data to test the strength
of the correlation between the previously reported (estimated) relay
capacity and relay capacity underestimation. We will report our results
to the Tor community.

We expect that the results of our experiment will help Tor decide how to
allocate resources and will help them plan and prioritize performance
improvements. It will also provide insight into the operation of the
current load balancing system, which uses advertised bandwidth to
produce consensus weights.

_______________________________________________
tor-relays mailing list
tor-relays at lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-scaling/attachments/20190813/102e35ba/attachment.sig>