probable FallbackDir relay determination bug

Dug into the situation Aeris reported where his kitten1 relay was nixed off the fallback directory list due to a single consensus where the dir-port was published as zero. Quickly found three additional very fast very stable relays where the same thing happened: BF0FB582E37F738CD33C3651125F2772705BB8E8 12-28:17 quadhead C43FA6474A9F071E9120DF63ED6EB8FDBA105234 12-25:03 ArachnideFR5 5665A3904C89E22E971305EE8C1997BCA4123C69 11-15:18 GermanCraft In each case the V2Dir flag and DirPort are removed for a single consensus interval and then restored on the next consensus. Appears that a daemon restart is the trigger. Version update not required. Several possible causes, but the one that comes to mind is the daemon could be briefly pushing a descriptor with V2Dir inactive while completing the reacahability test for that port. This deserves to be researched further and mitigated. A possible solution is to have the script tolerate a single consensus with V2Dir removed and a reset uptime. Alternately the daemon could not push a non-V2Dir descriptor during restart and wait for the reacahability determination. I don't know enough about it to be sure the problem isn't something else. At 19:20 1/12/2016 +0100, Aeris wrote:
Are you *absolutely* certain that the config was not fiddled with at the time of this event?
After grepping some logs, seems 13/12 was the day of a Tor upgrade:
2015-12-13 10:47:31 upgrade tor:amd64 0.2.7.5-1~d80.jessie+1 0.2.7.6-1~d80.jessie+1 2015-12-13 10:48:39 configure tor:amd64 0.2.7.6-1~d80.jessie+1
Timing is good compare to the 10:48:46 of the consensus !
But I don't remember a config change after that, perhaps only on /usr/share/tor/tor-service-defaults-torrc or on a default config param change ?
And perhaps the Tor reboot cause the DirPort to be temporarily disabled (seems not human, only 2s duration) ?
Regards, -- Aeris

Improved list: BF0FB582E37F738CD33C3651125F2772705BB8E8 12-28:17 quadhead 86E78DD3720C78DA8673182EF96C54B162CD660C 12-13:11 kitten1 6DE61A6F72C1E5418A66BFED80DFB63E4C77668F 12-19:11 eriador 39F096961ED2576975C866D450373A9913AFDC92 12-28:06 metaether 92CFD9565B24646CAC2D172D3DB503D69E777B8A 12-16:14 bakunin 6E7CB6E783C1B67B79D0EBBE7D48BC09BD441201 12-14:07 Paris 9E0CBB6958CE61DC631E8660963EB644BE92B256 12-12:04 freshman D74ABE34845190E242EC74BA28B8C89B0A480D4B 12-14:07 Helen 14E3111C54BB532FB39AA3C1367D0970DA2937D5 12-18:17 idfTor2 78BBAC4B66067E8D25AA78852257C08D2E7EB357 12-18:09 grothendieck EC639EDAA5121B47DBDF3D6B01A22E48A8CB6CC7 12-24:08 rainbowdash The above are all relays that were actually on the 12/11 list and did not appear on the 01/12 list. Two of the earlier examples were on neither list for some other reason, but updateFallbackDirs.py encountered the evidently spurious port change first in the exclusion logic. Valid examples nevertheless.
participants (1)
-
starlight.2016q1@binnacle.cx