Re: [tor-dev] Proposal 328: Make Relays Report When They Are Overloaded

11 Dec 2020

      On 12/11/20 08:04, David Goulet wrote:
...
we are talking a good 2-4
years minimum once the feature is stable thus we have to get this out soon if
we hope to be useful in the foreseeable future.
Right - the slow feedback cycle of deploying between deploying new
logging and trying to use it is all the more reason to plan ahead to try
to ensure the data will actually be suitable for the intended use :).
Granted, we can presumably at least *start* trying to prototype usage of
the data sooner than 2-4 years, but it'll probably still be some months
before any useful data starts arriving, right?
...
Onto your next question about the hour problem. So yes, you are correct that
the timeframe between informing the world I'm not overloaded anymore and the
world noticing, you might get under-utilized but you might also get "just
utilized enough".
All in all, we are stuck with a network that "morphs" every hour (new
consensus) but actually, bandwidth scanners take much longer to scan the
entire network (in the realms of days) thus it is actually much more than an
hour of being under-utilized :S.
So there will always be that gap where we will backoff from a relay and then
we might have backed off too much until the scanner notices it and then give
you a bit more. But over time, as the backoff happens and the bw scanner makes
correction, they should reach an equilibrium where the scanner finds the value
that is just enough for you to not advertise overload anymore or in other
words finding the sweet spot.
That is likely to require time and the relay to be maxi stable as in 99%
uptime and not too CPU/network fluctuations.
But also, as we backoff from overloaded relays, we will send traffic to
potentially under-utilized relays and so we hope that yes it will be a bumpy
road at first but after some days/weeks, network should stabilize and we
should actually see very few "overload-reached" after that point (except for
operators running 1000 other things on the relay machine eating the resources
randomly :).
Thanks for the explanation! IIUC the new consensus computed every hour
includes weights based on the latest data from the bandwidth scanners,
and an individual relay is only scanned once every x days?

In the proposal, maybe it'd be enough to briefly explain the choices of
parameters and any relevant tradeoffs - one hour for granularity, 72
hours for history, (any others?). It might also be helpful to have a
strawman example of how the data could be used in the congestion control
algorithm, with some discussion like the above ^, though I could also
see that potentially getting too far from the core of the proposal.

Btw, maybe it's worth explicitly explaining how the data *won't* be
useful for attackers? I'd assumed (possibly incorrectly) that the
history wasn't being kept at a finer granularity to avoid being able to
correlate events across relays, and from there perhaps be able to infer
something about individual circuit paths. If that sort of attack is
worth worrying about, should relays also suppress reporting events for
the current partial hour to avoid an attacker being able to probe the
metrics port to find out if an overload just happened?
...
Hope this answers your question!
Very helpful, thanks!

Re: [tor-dev] Proposal 328: Make Relays Report When They Are Overloaded

Jim Newsome