[metrics-bugs] #26015 [Metrics/Statistics]: Remove inconsistency between bandwidth history graphs

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu May 3 14:11:26 UTC 2018


#26015: Remove inconsistency between bandwidth history graphs
------------------------------------+----------------------
     Reporter:  karsten             |      Owner:  karsten
         Type:  enhancement         |     Status:  assigned
     Priority:  Medium              |  Milestone:
    Component:  Metrics/Statistics  |    Version:
     Severity:  Normal              |   Keywords:
Actual Points:                      |  Parent ID:
       Points:                      |   Reviewer:
      Sponsor:                      |
------------------------------------+----------------------
 Today I found an inconsistency between our various bandwidth history
 graphs:

  - The [https://metrics.torproject.org/bandwidth.html Total relay
 bandwidth] graph shows the sum of all bandwidth histories that we can find
 for a given day, whereas
  - the [https://metrics.torproject.org/bandwidth-flags.html Advertised and
 consumed bandwidth by relay flag] and [https://metrics.torproject.org
 /bwhist-flags.html Consumed bandwidth by Exit/Guard flag combination]
 graphs only show bandwidth histories of relays that we found in at least
 one consensus on a day.

 The reason is that we're only matching consensuses and extra-info
 descriptors for the second and third graph, but not for the first. And we
 need to do that in order to break down totals by guards/exits.

 While it may seem simpler to just skip that matching step in the first
 graph, it leads to inconsistent data. Consider the following data taken
 from `bandwidth.csv`:

 || date || isexit || isguard || advbw || bwread || bwwrite || dirread ||
 dirwrite ||
 || 2018-03-14 || f || f || 6757900851 || 2454444601 || 2493893288 ||  ||
 ||
 || 2018-03-14 || f || t || 15218678985 || 7024742679 || 7191640536 ||  ||
 ||
 || 2018-03-14 || t || f || 1592294787 || 562694042 || 558274048 ||  ||  ||
 || 2018-03-14 || t || t || 6189896122 || 3322794675 || 3356316394 ||  ||
 ||
 || 2018-03-14 ||  ||  || 29758770745 || 13367602689 || 13603416291 ||
 6877369 || 187770410 ||

 In theory, the sum of the first four rows should match the fifth row,
 modulo rounding errors.

 This works for advertised bandwidth (which is based on server descriptor
 data, not extra-info descriptors). But it does not work for bandwidth
 histories:

 {{{
 6757900851 + 15218678985 + 1592294787 + 6189896122 - 29758770745 = 0
 2454444601 + 7024742679 + 562694042 + 3322794675 - 13367602689 = -2926692
 2493893288 + 7191640536 + 558274048 + 3356316394 - 13603416291 = -3292025
 }}}

 The difference comes from relays that reported bandwidth histories but
 that the directory authorities did not list as running.

 Suggestion: we simply omit the bandwidth totals for cases where we have
 values by exit/guard flags:

 || date || isexit || isguard || advbw || bwread || bwwrite || dirread ||
 dirwrite ||
 || 2018-03-14 || f || f || 6757900851 || 2454444601 || 2493893288 ||  ||
 ||
 || 2018-03-14 || f || t || 15218678985 || 7024742679 || 7191640536 ||  ||
 ||
 || 2018-03-14 || t || f || 1592294787 || 562694042 || 558274048 ||  ||  ||
 || 2018-03-14 || t || t || 6189896122 || 3322794675 || 3356316394 ||  ||
 ||
 || 2018-03-14 ||  ||  || ~~29758770745~~ || ~~13367602689~~ ||
 ~~13603416291~~ || 6877369 || 187770410 ||

 We'd remove an inconsistency by doing so, and we'd remove some code. The
 graphing code would have to do one more step to aggregate data from four
 rows, but that's not critical.

 If this sounds reasonable to others, I'll prepare a patch. Setting to
 needs_review for the idea, not for code.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26015>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list