[tor-bugs] #18798 [Metrics/CollecTor]: analysis of descriptor completeness

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue May 24 19:19:07 UTC 2016

#18798: analysis of descriptor completeness
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  task               |         Status:  needs_information
 Priority:  Medium             |      Milestone:
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:

Comment (by karsten):

 Here are some random ideas why there might be missing referenced
 descriptors on your CollecTor instance:

  1. My CollecTor instance does not download descriptors from, because in May 2014 that directory authority left
 connections open without writing any bytes, and we're not handling
 timeouts very well yet.  I don't expect that to still be the case, but if
 you're seeing log lines with that authority indicating problems, maybe
 take that address out of `DownloadFromDirectoryAuthorities`.
  1. I'm not setting `DownloadAllServerDescriptors` and
 `DownloadAllExtraInfoDescriptors` in my CollecTor instance.  It might be
 that these settings are the cause for your missing numbers going down once
 per day.  Maybe the logs tell you more, or maybe you'll need to add logs
 for whenever your instance downloads all descriptors.  By the way, here's
 what I noted down when I disabled those settings: "By downloading "all"
 descriptors, we only learn the most recent descriptors for all known
 servers, not all known descriptors.  That’s not exactly what we’d expect.
 There’s also a potential problem with the result: the authority’s own
 descriptor ends with a double newline which might confuse metrics-lib;
 unless we split up concatenated descriptors differently in metrics-db.
 Found out both things on May 8, 2014 when looking more into #11648."
  1. I'm also not setting `CompressRelayDescriptorDownloads`.  Here's where
 I noted down why: "2014-04-29: changed to 0, because of
 "java.io.EOFException: Unexpected end of ZLIB input stream"".  I also
 noted down this: "The reason for broken compressed downloads might have
 been #11648, which should be fixed by August/September.  The current
 default in metrics-db for compressing downloads is 0.  That's bad.
 Consider fixing this once all directory authorities have upgraded.  Have

 And here are some suggestions for finding out more about the missing
  - For serverdesc missing (referenced by votes), can you plot how many of
 those are missing by how many votes?  I wouldn't worry so much about
 missing descriptors being referenced from a single vote but more about
 missing descriptors being referenced from (almost) all votes.
  - Regarding the extrainfo missing, have these been published by relays
 that only published a single server descriptor or relays that have been
 around for a longer time?  Again, more worried in the latter case.

 You'll notice that I'm mostly guessing here, because I don't know what
 could be going wrong.  But I think you're in a good position to spot a bug
 or three here.  Thanks for looking into this!

Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18798#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online

More information about the tor-bugs mailing list