Hi all,
Wanted to provide an update (even if it's not as good news as I hoped to give) because I know this is a very frustrating issue for everyone.
At a high level, the bwauth scripts segment the network into four segments ranked by relay speed, and measure each of these segments. They are 0-10, 10-40[0], 30-60, 60-100 (so the top 10% of relays by speed, and so on.) After 26 hours[1], the first scanner we added is currently at:
Segment 1: Completed, looped a couple times Segment 2: 34.2 of the 10-40 Segment 3: 46.5 of the 30-60 Segment 4: 77.4 out of the 60-100
This isn't very encouraging - it's slow going. Two of things that were just brainstormed in #tor-dev were:
1) Increasing the number of scanners tended to overwhelm the tor instance that supported these scanners, so we want to try doubling the scanners *and* the tors. Hopefully this will let us work our way through the list of relays faster. 2) We want to look at the possibility of relays moving around in the percentiles in each consensus, getting unlucky, and not being measured; potentially fixable by pinning a relay to a percentile, and then when we get all the way through a segment, unpin it, get that segment from the latest consensus, and restart. This may result in a relay being pinned into scanner#2, scanner#1 completes, measures the relay, then scanner#2 measures it... that wouldn't be so bad, double measuring is better than not measuring. When torflow was written over four years ago, there may have been a good reason it didn't work this way, and we need to see if we can re-reason out what that was.
So we are working on identifying the subtler issues and seeing if we can fix them, in addition to just adding more of the same.
-tom [2]
[0] The 10-40 overlap is a bug we just found. [1] Well, more like 50 hours, but the first 24 were lost because of a breaking change in a python point release [2] (Most of this is aagbsn's knowledge, I'm just transcribing it.)