[tor-dev] Using Stem's descriptor fetching module to replace the Java consensus-health checker

Karsten Loesing karsten at torproject.org
Thu Aug 8 13:43:56 UTC 2013


Hi Damian,

we briefly discussed Stem's new descriptor fetching module and how we
could extend the existing simple monitors [0] towards a replacement of
the Java consensus-health checker [1].

Moving this discussion to this list with your permission.

So, you asked what exactly the consensus-health checker a.k.a. DocTor
looks for.  Let me try to give you a quick overview of the different
parts [2].  Going through the Java source files in an order that
hopefully explains best how everything works together:

- Warning.java is an enum of all different warnings that DocTor can
emit.  Each warning contains a little documentation string saying what
it means.  If these are ambiguous, let me know, and I can probably
explain them better.

- Checker.java contains the various checks that are performed on
previously downloaded consensuses and votes.  For example,
checkMissingConsensuses goes through the (hard-coded) list of known
directory authorities and emits a ConsensusDownloadTimeout warning if we
couldn't download the consensus from at least one of them.  As you see,
there are plenty more check* methods.

- StatusFileReport.java uses the results from Checker by putting all
warnings in two output files, one of them containing all warnings, the
other only containing new warnings.  Each warning has a severity, which
can be ERROR, WARNING, or NOTICE.  Also, each warning defines a time
after which we consider the exact same warning string new even though
the warning hasn't changed.  The latter is useful to rate-limit
warnings.  For example, the fact that a certificate is going to expire
in two months from now doesn't have to be repeated every hour.

- MetricsWebsiteReport.java is the second output of DocTor.  It's the
website available at [3].  The idea is that the website gives more
information about warnings received on IRC or via email.  It's actually
a hack that this website is presented on metrics.  In a rewrite,
PyDoctor would have its own little webserver to present consensus-health
details.  Once it's in place and we shut down DocTor, I'm going to
replace the website on metrics with a static page linking to PyDoctor.

- DownloadStatistics.java keeps statistics about consensus download
times which are displayed on the website.

- Downloader.java is a wrapper for metrics-lib's descriptor downloader.

- Main.java puts everything together.  It first downloads everything,
then writes the status files containing warnings, and then generates the
website output.

So, that's what DocTor does right now.  Here are two more things that
would be great to have in DocTor or PyDoctor:

- Warn if directory authorities assign flags to unusually few or many
relays [4].  This enhancement has the potential of generating lots of
warnings, because the directory authorities currently vote *very*
differently on certain flags.  The result will be a lot of directory
authority operator nagging.  Just saying, you should be prepared for
that when deploying this!

- Ignore certain known warnings [5].  This will reduce a lot of noise on
the consensus-health mailing list.  The fewer noise there is the more
people will pay attention to actually valid warnings.  In theory.

Hope that makes sense.  Happy to provide more input or review code.
Just let me know!

All the best,
Karsten


[0] https://lists.torproject.org/pipermail/tor-dev/2013-July/005209.html

[1] https://www.torproject.org/getinvolved/volunteer#metrics-pyDoctor

[2]
https://gitweb.torproject.org/doctor.git/tree/HEAD:/src/org/torproject/doctor

[3] https://metrics.torproject.org/consensus-health.html

[4] https://trac.torproject.org/projects/tor/ticket/9103

[5] https://trac.torproject.org/projects/tor/ticket/8797


More information about the tor-dev mailing list