[tor-bugs] #18798 [CollecTor]: analysis of descriptor completeness

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Apr 13 09:42:22 UTC 2016


#18798: analysis of descriptor completeness
-----------------------+-----------------------------------
 Reporter:  iwakeh     |          Owner:  iwakeh
     Type:  task       |         Status:  needs_information
 Priority:  Medium     |      Milestone:
Component:  CollecTor  |        Version:
 Severity:  Normal     |     Resolution:
 Keywords:  ctip       |  Actual Points:
Parent ID:             |         Points:
 Reviewer:             |        Sponsor:
-----------------------+-----------------------------------

Comment (by karsten):

 Thanks for starting this analysis!

 Let me first speculate about the cause for those daily patterns that you
 found there.  And let me start by giving an example from recent logs to
 explain the format of `M-` lines:

 {{{
 M-2016-04-11T22:00:00Z ->
 D-38F20E16457647CCFF5BD131692D5FCA129E87DC210B456DA983AB291141C85D (0.0279
 -> 0.0279)
 M-2016-04-11T23:00:00Z ->
 D-38F20E16457647CCFF5BD131692D5FCA129E87DC210B456DA983AB291141C85D (0.0279
 -> 0.0279)
 M-2016-04-11T23:00:00Z ->
 D-597C4455AF049B147337BBFF35CE4817676339FF5C94E971A05D416FD1A2DD95 (0.0279
 -> 0.0558)
 M-2016-04-12T00:00:00Z ->
 D-38F20E16457647CCFF5BD131692D5FCA129E87DC210B456DA983AB291141C85D (0.0280
 -> 0.0558)
 M-2016-04-12T00:00:00Z ->
 D-597C4455AF049B147337BBFF35CE4817676339FF5C94E971A05D416FD1A2DD95 (0.0280
 -> 0.0558)
 }}}

  1. The first line means that there's a microdescriptor with digest
 `38F2..` missing from the microdescriptor consensus with valid-after time
 `2016-04-11 22:00:00`.  That missing microdescriptor adds a value of
 `0.0279` to the total missing descriptor count which is then `0.0279`.
 The idea is to only warn if that total value passes `1.0`.
  1. The second line says that the same missing microdescriptor is also
 referenced from the microdescriptor consensus with valid-after time
 `2016-04-11 23:00:00`.  Given that we shouldn't double-count that missing
 descriptor, we're not increasing the total count there.
  1. The third line mentions another microdescriptor with digest `597C..`
 that is missing, and in this case it's referenced from the microdescriptor
 consensus with valid-after time `2016-04-11 23:00:00`.  That one raises
 the total count by another `0.0279` to then `0.0558`.
  1. I guess the remaining two lines are self-explanatory at this point.

 Now, what could be the reason for the daily pattern you found there?
 First of all this has to do with the Tor network growing and shrinking
 over the day (surprise!).  My guess is that quite a few of the relays of
 which we're missing microdescriptors leave the network during some part of
 the day and rejoin at a later time.  So, when your numbers go up again,
 those are microdescriptors that we're still missing at that point, not
 newly missing microdescriptors.  At least that's my guess, I didn't
 confirm it with real data.

 Another reason for the high increase could be that you're double-counting
 missing descriptors by counting a descriptor that's missing in n consensus
 n times.  Again, I didn't look whether that's how you're counting things,
 I'm just guessing.

 Regarding your other question of which lines to look at, the `M-` lines
 are only a small part of what we're interested in.  In theory, everything
 after `Missing referenced descriptors:` is relevant for this analysis.
 Each of those lines lists a descriptor that references another descriptor
 that we're missing, which includes lines starting with:

  - `S-`: a server descriptor references an extra-info descriptor that is
 missing,
  - `V-`: a vote references a server descriptor that we're missing,
  - `C-`: a consensus references a server descriptor that we're missing,
 and
  - `M-`: a microdescriptor consensus references a microdescriptor that is
 missing (see above).

 I guess it would be interesting to have statistics on all four types of
 missing descriptors (or three if we count a server descriptor referenced
 from a vote or a consensus as the same).  Did I only give you three days
 of logs?  If so, I should give you at least a month of logs.  In
 particular the disk-full problem would skew the results a bit.

 Thanks!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18798#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list