[tor-bugs] #24628 [Metrics/Consensus Health]: bwauth= bug in consensus health

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Jan 26 03:38:30 UTC 2018


#24628: bwauth= bug in consensus health
--------------------------------------+---------------------
 Reporter:  tom                       |          Owner:  tom
     Type:  defect                    |         Status:  new
 Priority:  Medium                    |      Milestone:
Component:  Metrics/Consensus Health  |        Version:
 Severity:  Normal                    |     Resolution:
 Keywords:                            |  Actual Points:
Parent ID:                            |         Points:
 Reviewer:                            |        Sponsor:
--------------------------------------+---------------------

Comment (by tom):

 Started digging into this using the example in #24877.  Really weird.

 {{{
         bastet  maatuska        moria   gabelmoo        fara
 Pg      4800    6080            5100    5270            4160
 21      4770    6080            5110    5240            4160
 20      4760    6070            5070    5230            4160
 19      4750    6060            5060    5190            4160
 18      4750    6050            5050    5190            4150
 }}}

 That's the page value (for 2018-01-11-20-00) and the vote values (from
 collector) for surrounding hours. The consensus document says 5070, the
 page says 5070 also. So the votes are wrong.

 I parsed the moria vote with stem, and it gave me 5070.

 I searched for any vote in January, made by moria that had a Measured
 value of 5100. I got the following:

 * ./11/2018-01-11-23-00-00-vote-
 D586D18309DED4CD6D57C18FDB97EFA96D330566-A15ABFB2A6F993F16E8645C9C3AF16E13EA7934A-r
 ForEdSnowden AaHRX5/GftBfopZv+IMGftUCzwg F+hQq2QhOd7rM7N7z+K+cBw/HCA
 2018-01-11 14:52:14 51.15.133.16 9001 0
 * ./24/2018-01-24-17-00-00-vote-
 D586D18309DED4CD6D57C18FDB97EFA96D330566-5391D5738EF960227FDBA4776D914150BFEA1EDF-r
 ForEdSnowden AaHRX5/GftBfopZv+IMGftUCzwg lmIBHy2xp+nAsZ9PuZMBPDLm660
 2018-01-24 14:58:17 51.15.133.16 9001 0
 * ./24/2018-01-24-16-00-00-vote-
 D586D18309DED4CD6D57C18FDB97EFA96D330566-50F9CBB890A5BF2E4919DDB8FB5577FE48C34517-r
 ForEdSnowden AaHRX5/GftBfopZv+IMGftUCzwg lmIBHy2xp+nAsZ9PuZMBPDLm660
 2018-01-24 14:58:17 51.15.133.16 9001 0

 Okay so 23:00 is nearby.

 If I expand the table:

 {{{
         bastet  maatuska        moria   gabelmoo        fara
 Pg      4800    6080            5100    5270            4160
 0       4800    6110            5130    5270            4160
 23      4800    6110            5100    5270            4160
 22      4770    6110            5110    5250            4160
 21      4770    6080            5110    5240            4160
 20      4760    6070            5070    5230            4160
 19      4750    6060            5060    5190            4160
 18      4750    6050            5050    5190            4150
 }}}

 I checked henryi's timezone, and it's in UTC. The filename is written out
 based on the consensus's time in the file.

 Then I ran ps. And I found 3 processes running, one that had been running
 for 30 minutes, one for 2.5 hours, and one for 3.5 hours.

 Things are starting to come together. Maybe.

 I already know the script sometimes dies due to out of memory errors. Now
 I think I see why. I call subprocess.call at the end as a convenience.
 This invokes fork, doubling the amount of memory I've used. (And it's a
 lot.)

 I'm going to replace those calls and hopefully it will resolve ALL of the
 weird-ass errors we've been seeing with consensus-health.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24628#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list