[tor-scaling] Lessons Learned at Mozilla All-Hands

Thu Jun 27 18:51:00 UTC 2019

Gaba pointed out to me that the agenda for Friday lacks a recap of what
happened at Mozilla All-Hands. We can do that in the Friday meeting, but
for wider distribution and historical record, and to help focus that
discussion, I'll recap the technical details here. For organizational
details, Roger has another summary on one of our private mailing lists.

First, the week started with discussion of the questions that we needed
to answer in order to support experimentation on scaling, and
particularly to support experiments that add users to the Tor network.

We arrived at five questions that were important to answer:

  0. What performance metrics do we need, and what can we add easily?
  1. How many users does Tor have?
  2. How fast is Tor and how consistent is this performance?
  3. How many more users can we add?
  4. How fast will Tor be with more users and/or more capacity?

(Spoiler: #4 is by far the most important point of this mail).

0. What performance metrics do we need, and what can we add easily?

We spent a lot of time trying to determine if there were any metrics
that were missing from
https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics.

We brainstormed various ways to categorize metrics, research topics, and
questions that we needed to answer. We compared those categories,
topics, and questions to the metrics we have. We found that we need to
start recording Guard-based onionperf runs, long-term RTT metrics,
browser metrics, better failure metrics, and derive realistic user models.

Matt Finkle did some experimentation using Selenium and Tor Browser, and
found that without too much effort, we can add browser runs to our
metrics, if we can decide which websites to crawl:
https://people.torproject.org/~sysrqb/.6b01a9228e/all_plots/

Unfortunately though, as those websites change, these performance
results will become incomparable. However, update pings and update
throughput inside Tor Browser *are* stable and comparable. We could add
telemetry just to measure these performance characteristics, and/or
collect it manually, onionperf style.

We also learned that we can use historical onionperf data to emulate
some changes to an extent: we can figure out the effect of various Fast
and Guard flag cutoffs by removing data points containing those nodes
from onionperf data, and we can get a rough, but not fully accurate,
approximation of the effects of using particular Guards or pairs of
guards through similar means. Unfortunately, this kind of analysis can't
show us the emergent effects of network-wide adoption, or even capture
single-client emergent effects (like applying CBT while using more than
one Guard).

1. How many Tor users are there?

For this question, we used the data from Tor Browser update pings and
from the metrics website. Both data sets mostly agree, but both have
anomalies as well, and each may under count the number of actual users
in various ways.

We spent some time investigating various anomalies but did not arrive at
any concrete conclusions.

2. How fast is Tor and how consistent is this performance?

We have onionperf data going back until 2011, and we were able to
boxplot Tor throughput and built circuit latency to EU-based Torperf
servers from 2012-present:
https://people.torproject.org/~mikeperry/transient/Whistler2019/performance-boxplots-long.pdf

3. How many more users can we add?

To help an answer for this question, we took the "Spare Capacity" metric
and turned it into a normalized "Utilization" metric. In other words, we
took the network-wide bandwidth history (avg_use) and divided it by the
observed bandwidth (peak_use). To try to get closer to actual peak, we
then took 30 day maximums of peak_use per relay. This resulted in peaks
about 10-15% higher than simply using the descriptor values, and had
similar effects on the Utilization curve:
https://people.torproject.org/~mikeperry/transient/Whistler2019/utilization.pdf

From these utilization graphs, we noticed that the Tor network's all
time high utilization was in 2012, at about 65% before taking 30 day
peaks, and 56% after taking 30 day peaks. We also noted that the network
currently is at only 41% utilization, has had an all time low of 32%
utilization, and frequently has crossed 48% utilization throughout its
history.

We then computed the bandwidth used per user (avg_use/user_count), and
noticed that it is at a recent all-time high. Using the current all-time
high value, and a value from early 2018, we computed how many of these
users we could add to the current Tor network to hit utilization levels
of 48% and 56%.

To hit 48% utilization, we can add 380k current-sized users, or 630k
2018-sized users. To hit 56% utilization, we can add 830k current-sized
users, or 1.35M 2018-sized users.

These estimates obviously assume that whatever users we add will use the
network in the same way that our current users do, on average. This may
or may not be true, depending on where those users come from.

4.  How fast will Tor be with more users and/or more capacity?

Ok, buckle up. This is the most important thing we analyzed all week.

We took this normalized utilization curve, and the performance curve,
and examined our datapoints:

https://people.torproject.org/~mikeperry/transient/Whistler2019/4-utilization-context.png
https://people.torproject.org/~mikeperry/transient/Whistler2019/4-boxplots-context.png
https://people.torproject.org/~mikeperry/transient/Whistler2019/4-boxplots-compare.png

The hope was that we could use the utilization level to predict Tor
performance, all other things being equal.

But, while there is some correlation, there were obvious
discontinuities. In particular, January 2015 seems to be some kind of
magic turning point, before and after which Tor performance is
incomparable, for the same levels of network utilization.

Why is this? Possible explanations:
  A. Feature changes caused this.
     - No obviously responsible Tor features coincided with this time.
  B. Network usage change.
     - There was a botnet in 2013, but it was mostly gone by 2014.
  C. Different sizes of the Tor network are completely incomparable.
     - Utilization inside 2012-2015 and 2015-present is comparable.
  D. Total utilization does not (solely) indicate performance.
     - What about Exit utilization? Guard utilization?

My money is on D, but that we're just missing something extra. I hope
that if we graph Exit Utilization (ie avg_use/peak_use of just
Exit-flagged nodes), we will see some kind of insight there, and better
correlation overall.

In any case, understanding when and why the live Tor network is
incomparable to itself is essential to our ability to simulate the Tor
network accurately.

If we can't correlate live Tor network performance over time based on
known network parameters and measurable characteristics, there is no way
we can hope to simulate a Tor network that has any ability to predict
reality.

-- 
Mike Perry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-scaling/attachments/20190627/02b4eb0d/attachment.sig>