[tor-dev] stale entries in bwscan.20151029-1145
tom at ritter.vg
Thu Nov 5 17:47:27 UTC 2015
Talked with Mike on IRC:
12:12 < tjr:#tor-dev> mikeperry: If you have a moment today, we'd
appreciate it if you could peek at the tor-dev thread 'stale entries
12:14 < mikeperry:#tor-dev> that seems to be one of the mails I lost
12:14 < mikeperry:#tor-dev> (had a mail failure a couple weeks back)
12:14 < mikeperry:#tor-dev> oh, nm, it just hasn't arrived yet
12:16 < mikeperry:#tor-dev> tjr: torflow does indeed fetch a new
consensus for each slice now. they could be falling in between them :/
12:16 < mikeperry:#tor-dev> but the unmeasured scanner didn't pick them up even?
12:17 < tjr:#tor-dev> They are measured
12:17 < tjr:#tor-dev> they're big fast relays
12:18 < tjr:#tor-dev> Hm. Conceptually, do you see a problem with
locking a single consensus at the startup of a scanner?
12:24 < mikeperry:#tor-dev> tjr: it made them finish much faster,
since they didn't have to keep churning and spending CPU and making
additional measurements as relays come in and
out, but I was wondering if it would make
the gap problem worse
12:26 < mikeperry:#tor-dev> it seems odd that three relays would be
missed by all scanners though. I wonder what is special about them
that is causing them to fall through the
cracks for everyone for so long
12:26 < tjr:#tor-dev> Wait I'm confused. When you say "it" you mean
fetching a new consensus every slice, right? Why would fetching a new
consensus every slice use _less_ CPU and
do less churning. It seems that _would_ cause
new relays come in and out and make the gap problem worse
12:27 < mikeperry:#tor-dev> tjr: because what the code used to do was
listen for new consensus events, and dynamically update the slice and
the relays as the consensus came in
12:27 < tjr:#tor-dev> So these 3 should be covered by scanner1. They
were skipped, and I'm theorizing because they fell through gaps in the
slices inside scanner1
12:27 < mikeperry:#tor-dev> that would mean that every new consensus
period, the scanning machine would crawl to a stop, and also that
relays would shift around in the slice during
12:28 < tjr:#tor-dev> Okay, yea dynamically updating the slice in the
middle of the slice definetly sounds bad.
12:28 < tjr:#tor-dev> I'm proposing pushing it back even further -
instead of a new consensus each slice, lock the consensus at the
beginngin of a scanner for all slices
12:28 < mikeperry:#tor-dev> that is harder architecturally because of
the process model
12:29 < mikeperry:#tor-dev> though maybe we could have the
subprocesses continue on for multiple slices
So them falling between the slices would be my best guess. The
tedious way to confirm it would be to look at the consensus at the
times each slice began (in bws-data), match up the slice ordering, and
confirm that (for all N) when slicenum=N began Onyx was expected to be
On 5 November 2015 at 11:11, Tom Ritter <tom at ritter.vg> wrote:
> So... weird. I dug into Onyx primarily. No, in scanner.1/scan-data I
> cannot find any evidence of Onyx being present. I'm not super
> familiar with the files torflow produces, but I believe the bws- files
> list what slice each relay is assigned to. I've put those files
> (concatted) here: https://bwauth.ritter.vg/bwauth/bws-data
> Those relays are indeed missing.
> Mike: is it possible that relays are falling in between _slices_ as
> well as _scanners_? I thought the 'stop listening for consensus'
> commit would mean that for a single scanner would use the same
> consensus for all the slices in the scanner...
>  https://gitweb.torproject.org/torflow.git/commit/NetworkScanners/BwAuthority?id=af5fa45ca82d29011676aa97703d77b403e6cf77
> On 5 November 2015 at 10:48, <starlight.2015q4 at binnacle.cx> wrote:
>> Hi Tom,
>> Scanner 1 finally finished the first pass.
>> Of the list of big relays not checked
>> below, three are still not checked:
>> *Onyx 10/14
>> atomicbox1 10/21
>> *naiveTorer 10/15
>> Most interesting, ZERO evidence of
>> any attempt to use the two starred
>> entries appears in the scanner log.
>> 'atomicbox1' was used to test
>> other relays but was not tested
>> Can you look in the database files
>> to see if any obvious reason for
>> this exists? These relays are
>> very fast, Stable-flagged relays
>> that rank near the top of the
>> Blutmagie list.
>>>Date: Thu, 29 Oct 2015 19:57:52 -0500
>>>To: Tom Ritter <tom at ritter.vg>
>>>From: starlight.2015q4 at binnacle.cx
>>>Subject: Re: stale entries in bwscan.20151029-1145
>>>Looked even more closely.
>>>I flittered out all relays that are
>>>not currently active, ending up with
>>>a list of 6303 live relays.
>>>1065 or 17% of them have not be
>>>updated for five or more days,
>>>292 or 4% have not been updated
>>>for ten days, and 102 or 1%
>>>have not been updated for 15
>>>In particular I know of a very fast
>>>high quality relay in a CDN-grade
>>>network that has not been measured
>>>in 13 days. My relay Binnacle
>>>is a well run relay in the
>>>high-quality Verizon FiOS network
>>>and has not been measured for 10 days.
>>>This does not seem correct.
>>>P.S. Here is a quick list of some
>>>top-30 relays that have have been
>>>>At 13:35 10/29/2015 -0400, you wrote:
>>>>>The system is definetly active. . . .the most recent file has ten day old entries?
>>>>Just looked more closely. About 2500
>>>>of 8144 lines (30%) have "updated_at=" more
>>>>than five days ago or 2015/10/24 00:00 UTC.
>>>>Seems like something that should have
>>>>an alarm check/monitor.
More information about the tor-dev