[tor-bugs] #18798 [Metrics/CollecTor]: analysis of descriptor completeness

Fri Apr 22 15:26:23 UTC 2016

#18798: analysis of descriptor completeness
-------------------------------+-----------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  task               |         Status:  needs_information
 Priority:  Medium             |      Milestone:
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+-----------------------------------

Comment (by karsten):

 Thanks for the update!  As mentioned briefly yesterday, but also for the
 record here, the disk ran full on April 1st, so that's what caused those
 problems.

 I agree that there aren't many referenced descriptors missing.  That's a
 good outcome of this analysis, and it's a sign that the current logic to
 fetch missing descriptors is working okay.

 To be honest, I don't have a good explanation for those many missing
 microdescriptors.  The week-long pattern there is probably the result from
 most microdescriptors being replaced after a week.  I don't yet understand
 why CollecTor wouldn't be able to fetch missing microdescriptors during
 that week.  We might be looking at a bug there, either in the collection
 or in the logging.  But I'd say put that under low priority for now,
 because microdescriptors are the least important descriptors we're
 collecting.  In theory (!), we would be able to generate our own
 microdescriptors from server descriptors, so missing some of them is not a
 big deal.

 But here's something that we're yet missing in the analysis and that just
 crossed my mind!  We're only looking at missing ''referenced''
 descriptors, but we're totally ignoring missing ''referencing''
 descriptors, namely consensuses, microdescriptor consensuses (less
 important), and votes.  There are no log lines for those descriptors, but
 there should be 1 new consensus, 1 new microdescriptor consensus, and 9
 new votes every hour.  The part that makes it so important to get these
 descriptors is that they become unavailable after that hour.  That's
 different from referenced descriptors, which is why we seem to be
 recovering well from those disk-full problems.  That might look different
 with consensuses, microdescriptor consensuses, and votes.

 In theory, finding out whether any of those are missing should be a matter
 of fetching descriptor tarballs and counting files.  Would you want to
 make graphs for those counts and put them next to the missing
 ''referenced'' descriptors graphs?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18798#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online