[metrics-bugs] #23367 [Metrics/Statistics]: Onion address counts ignore descriptor upload overlap

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Jan 14 12:06:57 UTC 2020

#23367: Onion address counts ignore descriptor upload overlap
 Reporter:  teor                |          Owner:  metrics-team
     Type:  defect              |         Status:  needs_review
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:  #23126              |         Points:
 Reviewer:                      |        Sponsor:
Changes (by karsten):

 * status:  new => needs_review
 * keywords:  metrics-2018 =>


 Finally, I got it. (I didn't think the whole 2 years about this, but when
 I started looking at this ticket again this morning it took me a while to
 understand the bug...)

 The situation is slightly different from your description, because
 statistics are not collected from 00:00 UTC but from whenever a relay
 starts collecting them. Your general statement that we're accounting for
 descriptor upload overlap wrong is correct, though.

 My current thought is to document this inaccuracy rather than changing the
 code. It's a known inaccuracy of roughly 1/24 = 4.2% of absolute numbers.
 But it doesn't affect relative changes over time. I don't think that
 changing the code and reprocessing the statistics is worth the effort,
 also regarding explaining why the numbers have changed now.

 Here's how we could document this on the [https://metrics.torproject.org
 /reproducible-metrics.html#onion-services Reproducible Metrics] page:

 ''As an approximation, we assume that an onion service publishes its
 descriptor to twelve directories over a 24-hour period: the service stores
 two replicas per descriptor using different descriptor identifiers, both
 descriptor replicas get stored to three different onion-service
 directories each, and the service changes descriptor identifiers once
 every 24 hours which leads to two different descriptor identifiers per

 ''To be clear, this approximation is not entirely accurate. For example,
 '''the descriptors of roughly 1/24 of services are seen by 3 rather than 2
 sets of onion-service directories, when a service changes descriptor
 identifiers once at the beginning of a relay's statistics interval and
 once again towards the end. In some cases,''' the two replicas or the
 descriptors with changed descriptor identifiers could have been stored to
 the same directory. As another example, onion-service directories might
 have joined or left the network and other directories might have become
 responsible for storing a descriptor which also include that .onion
 address in their statistics. However, for the subsequent analysis, we
 assume that neither of these cases affects results substantially.''

 What do you think about this change?

 I also agree that we should keep this in mind when we work on v3 stats. We
 should keep this ticket open, turn it into an enhancement, and update the
 summary a bit to make it clear that the remaining work is just for v3.

Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23367#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online

More information about the metrics-bugs mailing list