[tor-dev] Using Stem's descriptor fetching module to replace the Java consensus-health checker

Damian Johnson atagar at torproject.org
Tue Aug 20 02:41:14 UTC 2013

Thanks, Karsten, for the code review! I decided to take a long weekend so
was able to address all of it...

> Yes, I agree that we should have your script start sending results to
> the mailing list.  Want to set that up?  Not sure if your mails will
> bounce until somebody approves your sender address, but I guess we'll
> find out.

Nope, not getting through. Asking for my addresses to be whitelisted....


> 1. Do the authorities voting on BadExit agree about relays they assign
> this flag to?  The warning might contain the diff of relay fingerprints
> they voted or didn't vote BadExit on.  We'll probably want to rate-limit
> this warning to every 6 or 12 hours.



Btw, it looks like they presently *are* out of sync...

NOTICE: Authorities disagree about the BadExit flag for
7E6A3AA70A156167E7AE543E50EF54321EC80AF0 (with flag: Faravahar, without
flag: tor26, moria1)
NOTICE: Authorities disagree about the BadExit flag for
ADF62D3A1305F0B5404D41EEDADA68ECD294FC60 (with flag: Faravahar, without
flag: tor26, moria1)

> 2. Do the bandwidth authorities report roughly the same number of
> Measured lines, or do these numbers diverge beyond a given threshold
> (say, 20%)?  This warning should probably be rate-limited to every 24
> hours, because new bandwidth authorities take some time to measure the
> network.

Also done...


> How about we start a new thread on this list discussing only the part
> where we're planning to kill the website and move everything to the
> status emails?

Personally I don't see value in forking this thread, but I don't care
strongly either.

> ... I still disagree with you about Pepper Jack
> being tasty cheese and that I can highly recommend Muenster cheese.


> - Can you add a license to your script?

Done, opting for 3-clause BSD like DocTor...


> - Your rate-limiting logic seems to work only with full hours (and
> days).  This may lead to unexpected results if two script executions
> don't start at the exact minute and second of an hour.  In one case a
> message might be suppressed, in another case it might not.  That's why I
> defined all rate-limiting intervals as X:30 hours.  Maybe simply add or
> subtract 30 minutes from all intervals just to be sure, or add a minutes
> parameter with a default of 30.

Good catch. Fixed...


> - DocTor produces two output files containing warnings, one with new
> warnings and one with all warnings (which is empty if there are zero new
> warnings).  New warnings are sent to the IRC bot and all warnings are
> sent to the mailing list.  I agree that this approach is somewhat
> complicated, so maybe it's sufficient to just send the new warnings to
> both the IRC bot and the mailing list.  Unless you didn't realize there
> was such a distinction and think it's worth adding to your script.  Up
> to you.

I've ignored the IRC bot notifications since I'm both unfamiliar with it
and presently lack a mechanism to provide it with notifications.

I take it as if the bot periodically reads files from disk then dumps the
contents into an IRC channel? Who maintains the bot and might we switch it
to another notification mechanism? Maybe it should read
tor-consensus-health@ instead?

Alternatively I can change my send() function to do whatever the bot
maintainer wants, though dumping files to disk seems a bit odd to me.

> - Are old warnings ever removed from your last_notified.cfg?  Not sure
> if it matters though.

Nope. When the suppression expires and the issue goes into alarm again it
replaces the value.

> - Is the %is in `log.info("Suppressing %s, time remaining is %is"`
> supposed to be a %s?

Nope. It's logging an integer value with an extra 's' for seconds (ex,
"45s"). Logging seconds here is pretty less-than-useful so swapped it to

> - Your rule about not downloading a vote from an authority that didn't
> provide a consensus before seems quite strict.  In theory, we could ask
> another authority for that first authority's vote.  Maybe the first
> authority just didn't want to talk to us but was happy to talk to all
> other authorities.  We should probably learn about problems with that
> authority's vote (or the absence of problems) even if it doesn't talk to
> us.  I'm aware that this will make the download logic somewhat more

Actually, this makes it a little cleaner. The 'only download vote if we got
a consensus' thing was to avoid redundant notices when an authority goes
down (ie. both "unable to download consensus from X" and "unable to
download vote from X"). With this fallback logic I don't really need to do


... on a side note think I'll move the authority fingerprint and v3ident
into stem later. Other scripts will likely want them too.

> - Typo in `unknown_consensus_parameteres`.
> - Typo in `incompatable_authorities`.

Nice catches. Fixed.

> - In `certificate_expiration`, is your check that a vote has exactly
> one authority really necessary?  Wouldn't stem complain if this were not
> the case?

Nope, when reading the dir-spec it didn't cross my mind to assert that
about votes. Now that you mention it though I agree that this belongs in
stem - done...


> - Speaking of stem complaining about invalid votes or consensuses: what
> does your script do in such a case?

Invalid documents are treated in a similar fashion to failing to be
download, except that they state the validation issue.

> - I wonder if your check in `has_expected_fingerprints` would complain
> if somebody set up a fake "moria1".  It probably shouldn't.  That's why
> I added IP address and port to checkAuthorityRelayIdentityKeys.

Ahhh, I was wondering why DocTor did that. Good point, addressing this by
also looking for the Named flag.

> - Maybe I missed them while reading your code, but did you add checks
> for checkContainedVotes and checkConsensusSignatures?  Both checks are
> quite important, because even if an authority claims to add a vote, it
> may not have become part of the consensus, or the authority may not have
> signed the consensus that it contributed a vote to before.

Huh, wonder how I missed those. Added...


Cheers! -Damian

PS. A couple other things that need to be addressed at some point are:

* Running this on a TPO host rather than my desktop, and maybe also use a
TPO address to send the emails too.
* Moving these scripts to a real repository rather than my 'tor-utils' user
repo. Not sure though if we should keep the name 'DocTor' or opt for
something more descriptive like 'DescriptorMonitors'.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20130819/5b838374/attachment-0001.html>

More information about the tor-dev mailing list