Hi,
On 3 Jun 2019, at 17:48, Roger Dingledine arma@torproject.org wrote:
On Sun, Jun 02, 2019 at 10:43:14PM +1000, teor wrote: Let's deploy sbws to half the bandwidth authorities, wait 2 weeks, and see if exit bandwidths improve.
We should measure the impact of this change using the tor-scaling measurement criteria. (And we should make sure it doesn't conflict with any other tor-scaling changes.)
Rolling out more sbws measurers sounds good to me.
But, maybe I haven't been following, but isn't the first plan for sbws to replace torflow but have identical behavior? And then we can work on changing it to have better behavior?
No, we fixed some obvious torflow bugs and design flaws.
Here are some details:
Let's talk engineering tradeoffs.
sbws had a few conflicting goals: * create a modern bandwidth scanner implementation * produce results that are similar to torflow * be ready to deploy in 2019
Here's how we resolved those tradeoffs: * use modern designs, libraries, and protocols when building sbws * compare sbws results against torflow, and identify any issues: * when torflow is obviously wrong, do something better in sbws * when sbws is obviously wrong, log a bug against sbws, and triage it * when the results differ by a small amount, accept that difference
See these tickets for more details: https://trac.torproject.org/projects/tor/ticket/27339 https://trac.torproject.org/projects/tor/ticket/27107
Here are some network health checks we are doing as we deploy sbws: https://sbws.readthedocs.io/en/latest/monitoring_bandwidth.html
Here are some FAQs about the design, and the bandwidth file spec: https://sbws.readthedocs.io/en/latest/faq.html https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt
It would be great to have more design documentation, but keeping that documentation up to date is a lot of work. And we needed to deliver working code, too.
I ask because in that case switching to more sbws measurers should not cause the exit bandwidths to improve, until we then change the measurers to measure better.
One of the design flaws that we fixed was torflow's "scanner partitions".
Relays can get stuck in a slow torflow scanner partition, and never improve their measurements.
But in sbws, each relay is measured against a random faster relay. sbws tries to choose relays that are at least 2x faster than the target.
So some stuck relay bandwidths should improve under sbws, as long as we have enough sbws instances (about half, I think).
That said, there are still some bugs in sbws. Some of those bugs were copied from torflow. Others are new bugs. sbws has detailed diagnostics that will help us chase down and fix these bugs.
And we can also make design changes. But let's stabilise sbws first, and fix any high-impact bugs.
T