[metrics-bugs] #28116 [Metrics/Statistics]: Split up legacy module into more maintainable parts

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Oct 19 08:14:45 UTC 2018

#28116: Split up legacy module into more maintainable parts
     Reporter:  karsten             |      Owner:  karsten
         Type:  enhancement         |     Status:  assigned
     Priority:  Medium              |  Milestone:
    Component:  Metrics/Statistics  |    Version:
     Severity:  Normal              |   Keywords:
Actual Points:                      |  Parent ID:
       Points:                      |   Reviewer:
      Sponsor:                      |
 Our legacy module is a mess. That code dates back to a time when we tried
 to use a single database for all our statistics and for a service called
 relay search, which was not the same service as today's relay search.
 While I'm not ruling out that we can make a single-database approach work
 for everything we want to do with our data, it's not going to be ''this''

 It's time to move away from this legacy database and take a similar
 approach as we're taking for the other modules, where we only store the
 relevant parts that we need for our graphs.

 As of now, the legacy module provides data for the following graphs:

  - In the Servers category:
   1. Relays and bridges
   2. Relays by relay flag
   3. Relays by tor version
   4. Relays by platform
  - In the Traffic category:
   5. Total relay bandwidth
   6. Advertised and consumed bandwidth by relay flag
   7. Consumed bandwidth by Exit/Guard flag combination
   8. Bandwidth spent on answering directory requests

 Viewed from a different perspective, these 8 graphs show 3 different
  - Relay or bridge counts in graphs 1 to 4
  - Advertised bandwidths in graphs 5 and 6
  - Bandwidth histories in graphs 5 to 8

 I could imagine that we make the following changes to split up the legacy
 module into more maintainable parts:
  1. Use existing data from the ipv6servers module for graph 1 and for the
 advertised bandwidth portions in graphs 5 and 6. This data already exists
 with only trivial differences affecting how we're treating missing data.
 We could just switch.
  2. Extend the ipv6servers module to also provide data for graphs 2 to 4.
 This extension would require us to reimport the entire archive, so it's
 more of a rewrite. But the ipv6servers module code is much cleaner and
 easier to extend than the legacy module code. And when we extend that
 module, we can relatively easily add bridge statistics and other relay
 metrics like consensus weight or path selection probabilities that we can
 use in new graphs later on. All in all not a trivial amount of work, but
 probably worth it.
  3. Keep the remaining parts of the legacy module for the bandwidth
 history parts in graphs 5 to 8. Bandwidth histories are going to be
 replaced by PrivCount data in the medium term anyway. We could keep the
 legacy module around for another year or two without planning to change
 much during that time. And when we shut it down, we can keep a copy of the
 aggregate data around, just like we're going to keep a static summary of
 the Tor Messenger statistics (#26047).

 I'll start working on the second suggested change above. The two other
 changes depend on whether that second change can be made successfully.

Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28116>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online

More information about the metrics-bugs mailing list