[tor-scaling] notes & next meeting

Sun Apr 28 05:06:44 UTC 2019

On Fri, Apr 26, 2019 at 11:54:50AM -0700, Gaba wrote:
> Roger points out four things to watch out for during these experiments:

Here were my four things:

(A) 72-hour (or 48-hour) cycle is not as good as a 7-day cycle: we will
be sad if we don't account for cyclic behavior on the real network. Like,
comparing a Tuesday to a Saturday will result in surprises. Better to
compare a Tuesday to a Tuesday. We learned this lesson before, in the
user count anomaly detector: see e.g. the weekly pattern in
https://metrics.torproject.org/userstats-relay-country.html?start=2017-01-27&end=2018-01-01&country=ae&events=on

(B) Even if we do 7-day cycles (turn on the feature for 7 days,
turn it off for 7 days, compare), some state will still bleed over
between cycles. In particular, clients will have already picked their
guards before the experiment. Or the inverse, they pick them during
the experiment, and still have them afterwards. That latter issue is a
great example of why we need to think through anonymity impact of each
experiment -- the effect of an experiment can last for months after we
turn it off.

(C) There are other metrics to look at when assessing whether an experiment
is working, e.g. total network bandwidth used. Maybe that's just the dual
of the existing user-side performance metrics ("when user throughput
goes up, total network load will go up too because everybody will be
getting more of it"), or maybe they're different and need to be assessed
separately. More broadly, once we start one of these experiments,
and we're trying to look at everything we can to see if it's working,
we should watch what we do (what we look at) and fold that into the plan
for later experiments.

(D) Spare network capacity (advertised bandwidth minus load) is tricky
as a metric, because our advertised bandwidth is a function of relay
load. So it's easy to end up with circular analysis, where e.g. we add
more load onto a relay which causes it to "discover" that its capacity
is higher -- which counterintuitively means that increasing the load on
the Tor network could result in more spare capacity. So we need to be
really careful drawing conclusions about this spare bandwidth metric --
and we might be best served by finding some other metric, or finding a
novel way to measure what we think is the real capacity of a relay.

(This relay capacity thing has always been in a thorn in the side of the
Shadow models too, because we need to know how big to make the relays
in Shadow, and we don't have accurate measurements about how big the
relays are in the live network.)

--Roger