At 11:47 11/5/2015 -0600, Tom Ritter wrote:
. . . So them falling between the slices would be my best guess. . .
Immediately comes to mind that dealing with the changing consensus while scanning might be handled in a different but nonetheless straightforward manner.
Why not create a snapshot of the consensus at the time scanning commences then-- without disturbing the scanners-- pull each new update with an asynchronous thread or process. The consensus thread would diff against the previous state snapshot and produce an updated snapshot plus deltas for each scanner and/or slice as the implementation requires. Then briefly lock the working list for each active scanner and apply the delta to it.
By having a single thread handle consensus retrieval and sub-division, issues of "lost" relays should go away entirely. No need to hold locks for extended periods.
The consensus allocation thread would run continuously, so individual slices and scanners can complete and restart asynchronous to each other without glitches or delays.
Consensus allocation worker could also manage migrating relays from one scanner to another, again preventing lost relays.