Re: [tor-dev] Raising AuthDirMaxServersPerAddr to 4?

4 Jun 2019

      teor:
...
I have an alternative proposal:
Let's deploy sbws to half the bandwidth authorities, wait 2 weeks, and
see if exit bandwidths improve.
We should measure the impact of this change using the tor-scaling
measurement criteria. (And we should make sure it doesn't conflict
with any other tor-scaling changes.)
I like this plan. To tightly control for emergent effects of all-sbws vs
all-torflow, ideally we'd switch back and forth between all-sbws and
all-torflow on a synchronized schedule, but this requires getting enough
measurement instances of sbws and torflow for authorities to choose
either the sbw file, or the torflow file, on some schedule. May be
tricky to coordinate, but it would be the most rigorous way to do this.

We could do a version of this based on votes/bwfiles alone, without
making dirauths toggle back and forth. However, this would not capture
emergent effects (such as quicker bwadjustments in sbws due to decisions
to pair relays with faster ones during measurement). Still, even
comparing just votes would be better than nothing.

For this experiment, my metric of choice would be "Per-Relay Spare
Network Capacity CDF" (see
https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/Performan...),
for both the overall consensus, and every authority's vote. It would
also be useful to generate separate flag breakdowns of this CDF (ie
produce separate CDFs for Guard-only, Middle-only, Exit-only, and
Guard+Exit-only relays).

In this way, we have graphs of how the votes and the consensus
distribution of the difference between self-reported and measured values
across the network. We should be able to pinpoint any major
disagreements in how relays are measured compared to their self-reported
values with these metrics. (In the past, karsten produced very similar
sets of CDFs of just the measured values per vote when we were updating
bwauths, and we compared the shape of the measured CDF, but I think
graphing the difference is more comprehensive).

We should also keep an eye on CDF-DL and the failure rainbow metrics, as
they may be indirectly affected by improvements/regressions in load
balancing, but I think the distribution of "spare capacity" is the first
order metric we want.

Do you like these metrics? Do you think we should be using different
ones? Should we try a few different metrics and see what makes sense
based on the results?
...
If we do decide to change AuthDirMaxServersPerAddr, let's work out how
many new relays would be added to the consensus straight away. There
shouldn't be too many, but let's double-check.
Hrmm.. This may be hard to determine, and it would only make immediate
difference if many relay operators already have more than 2 relay
instances actively trying to run on a single IP, such that the
additional ones are still running but currently being rejected
constantly.. I'm guessing this is not common, and relay operators will
have to manually decide to start more instances.

I also don't think that these approaches need to be either/or. I think
there are many independent reasons to allow more relays per IP (tor is
single-threaded and caps out somewhere between 100-300Mbit per instance
depending on CPU and AES acceleration, so many fast relay operators do
the multi-instance thing already, if they have the spare IPs).

I also think that if I'm right about most relay operators needing to
make this decision manually, the effect of allowing 4 nodes per IP will
mostly blend in with normal network churn over time.

So, as long as we tightly control switching sbws vs torflow and have
result files from each for the duration of the experiment, I think that
we can do both of these things at once. There's going to be capacity and
load churn like this over time naturally, anyway. This
switching-back-and-forth methodology is meant to control for that.

-- 
Mike Perry