On Mon, Nov 19, 2018 at 7:36 AM teor teor@riseup.net wrote:
Hi,
We have deployed sbws on one bandwidth authority (longclaw).
Here's a request for additional feedback, and a progress update:
Request for Feedback: Relay Bandwidth Self-Tests
Torflow and sbws use relays' self-reported observed bandwidths for load balancing. But relays can have really low bandwidths because they're new, or due to random path selection.
In torflow, relays can get stuck in a low-bandwidth partition. sbws doesn't have partitions. But in both systems, low bandwidths can cause inaccurate or unstable load balancing.
Since torflow and sbws need accurate self-reported relay bandwidths, some component of the Tor network needs to send enough bandwidth through every relay.
Here are our current choices:
Tor relays can do a regular bandwidth self-test, so that their first descriptor has an accurate bandwidth (up to some minimum). But the current self-test is too small, and buggy.
sbws already sends bandwidth to all relays to measure them. sbws gets accurate bandwidths for most relays within 2 weeks, but the fastest relays can take a month to ramp up. (sbws starts measuring at the median relay bandwidth, and can double every 5 days.)
Should we improve relay bandwidth self-tests? (#22453) Or should we rely on sbws to create the bandwidths it needs? What about test networks?
Hi! I don't think I have the answers here, but maybe I can think aloud in a useful way.
From my point of view, either of these is a fine idea, if it works.
We could decide based on a lot of factors, like:
* Which one is easier to do? * Which creates the greater maintenance burden, moving forward? * Which is more robust if something breaks in the future? * Which consumes the most relay bandwidth? * Which requires SBWS to use the most bandwidth?
Maybe if we had those figured out, we'd have a better time deciding.
Should we make bandwidths grow faster in sbws? Or is a ramp-up period of 2-5 weeks fast enough?
I think that's fast enough, though I'm not sure. How does it compare with the current average torflow ramp-up time?
(We won't modify and re-deploy torflow.)