[metrics-bugs] #28305 [Metrics/Statistics]: Include client numbers even if we think we got reports from more than 100% of all relays

Sun Nov 4 23:58:53 UTC 2018

#28305: Include client numbers even if we think we got reports from more than 100%
of all relays
--------------------------------+------------------------------
 Reporter:  karsten             |          Owner:  metrics-team
     Type:  defect              |         Status:  new
 Priority:  High                |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:  SponsorV-can
--------------------------------+------------------------------
Changes (by teor):

 * sponsor:   => SponsorV-can

Comment:

 Replying to [comment:2 karsten]:
 > You'll find a description/specification how frac is calculate here:
 https://metrics.torproject.org/reproducible-metrics.html#relay-users
 >
 > Maybe rounding error was not the right term. In fact, I believe it might
 be a situation like the one you're describing. I can extract the variable
 values going into the frac formula; maybe one of them is responsible for
 getting us above the 100%.

 I wonder if changing the bandwidth interval to 24 hours revealed this
 issue?

 For servers which report 24 hour intervals, I think that:
 {{{
 h(R^H) is usually equal to h(H)
 n(H) is usually 24
 n(R\H) is usually 0
 n(N) can be slightly less than 24, if a relay was unreachable or
 misconfigured, but didn't go down
 Therefore, frac can be slightly more than 1.
 }}}

 > However, we should carefully consider whether we want to change that
 formula or rather not touch it until we have PrivCount as replacement. If
 we think the frac value isn't going to grow much beyond 100%, we could
 just accept that inaccuracy and live with it. If we think it's going to
 grow towards, say, 150%, I agree that we'll have to do something.

 I think a similar analysis applies to PrivCount: if a relay is up for the
 whole day, then it will report statistics using PrivCount. But if that
 relay is dropped from some consensuses due to reachability, then our idea
 of the average number of running relays will be too low.

 We won't see this bug until almost all relays are running PrivCount. But
 let's avoid re-implementing this bug in PrivCount if we can.

 What can PrivCount do to avoid introducing a similar bug?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28305#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online