[tor-dev] stale entries in bwscan.20151029-1145

Tom Ritter tom at ritter.vg
Fri Nov 6 02:56:04 UTC 2015

On 5 November 2015 at 16:37,  <starlight.2015q3 at binnacle.cx> wrote:
> At 11:47 11/5/2015 -0600, Tom Ritter wrote:
>> . . .
>>So them falling between the slices would be my
>>best guess. . .
> Immediately comes to mind that dealing
> with the changing consensus while
> scanning might be handled in a different
> but nonetheless straightforward manner.
> Why not create a snapshot of the consensus
> at the time scanning commences then--
> without disturbing the scanners--
> pull each new update with an asynchronous
> thread or process.  The consensus thread
> would diff against the previous state
> snapshot and produce an updated snapshot
> plus deltas for each scanner and/or slice
> as the implementation requires.  Then
> briefly lock the working list for each
> active scanner and apply the delta to it.
> By having a single thread handle
> consensus retrieval and sub-division,
> issues of "lost" relays should
> go away entirely.  No need to hold
> locks for extended periods.
> The consensus allocation thread would
> run continuously, so individual slices
> and scanners can complete and restart
> asynchronous to each other without
> glitches or delays.
> Consensus allocation worker could also
> manage migrating relays from one scanner
> to another, again preventing lost
> relays.

So I'm coming around to this idea, after spending an hour trying to
explain why it was bad. I thought "No no, let's do this other
thing..." and then I basically designed what you said.  So the main
problem as I see it is that it's easy to move relays between slices
that haven't happened yet - but how do you do this when some slices
are completed and some aren't?

Relay1 is currently in scanner2 slice2, but the new consensus came in
and it should be in scanner1 slice 14.  Except scanner2slice2 was
already measured and scanner1slice14 has not been.  What do you do?

Or the inverse.  Relay2 is currently in scanner1 slice14. But the new
consensus says it should be in scanner2 slice 2.  But scanner2slice2
was already measured, and scanner1slice14 had not been.  You can only
move a relay between two slices that have yet to be measured.  But
everything is 'yet to be measured' unless you're going to halt the
whole bwauth after one entire cycle and then start over again.

Which... if you used a work queue instead of a scanner, might actually work...?

We could make a shared work queue of slices, and do away with the idea
of 'separate scanners for differerent speeded relays'... When there's
no more work, we would get a new consensus and make a new work queue
off of that.  We would assign work items in a scanner-like pattern,
and as we get new consensuses with new relays that weren't in any
slices, just insert them into existing queued work items. (Could also
go through and remove missing relays too.)

Moving new relays into the closest-matching slice isn't hard, and
swapping relays between yet-to-be-retrieved slices isn't that hard
either.  The pattern to select work items is now the main source of
complexity - it needs to estimate how long it takes a work item to
complete, and give out work items such that it always keeps some gaps
around to insert new relays into that aren't _too_ far away from that
relay's speed.  (Which is basically what the scanner separation was
set up for.)  It could also fly off the rails by going "Man these fast
relays take forever to measure, let's just give out 7 work items of
those" - although I'm not sure how bad that would be. Needs a
simulator maybe.

FWIW, looking at
https://bwauth.ritter.vg/bwauth/AA_scanner_loop_times.txt , it seems
like (for whatever weird reason) scanner1 took way longer than the
others. (Scanner 9 is very different, so ignore that one.)

Scanner 1
     5 days, 11:07:27
Scanner 2
     3 days, 19:00:03
Scanner 3
     2 days, 19:48:15
     2 days, 9:36:13
Scanner 4
     2 days, 18:42:21
     2 days, 19:41:16
Scanner 5
     2 days, 13:21:20
     2 days, 11:20:53
Scanner 6
     2 days, 20:19:48
     2 days, 13:46:30
Scanner 7
     2 days, 9:04:49
     2 days, 12:50:34
Scanner 8
     2 days, 14:31:50
     2 days, 15:05:28
Scanner 9

More information about the tor-dev mailing list