[tor-bugs] #24218 [Metrics/Statistics]: Implement new metrics-web module for IPv6 relay statistics

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Dec 6 11:14:19 UTC 2017


#24218: Implement new metrics-web module for IPv6 relay statistics
--------------------------------+------------------------------
 Reporter:  karsten             |          Owner:  metrics-team
     Type:  enhancement         |         Status:  needs_review
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:
--------------------------------+------------------------------
Changes (by karsten):

 * status:  new => needs_review


Comment:

 So, I rewrote the earlier prototype into a metrics-web module that uses a
 PostgreSQL database. Please review [https://gitweb.torproject.org/karsten
 /metrics-web.git/log/?h=task-24218 my task-24218 branch].

 Here are some first (meta) statistics on how it performs:
  - Processed five weeks of descriptors from 2017-11-01 to 2017-12-04,
 roughly 500M in XZ-compressed form plus recent descriptors from past three
 days.
  - Processing took ~12 minutes on my laptop.
  - The resulting database has a size of ~1G before vacuuming and ~150M
 afterwards.

 Remaining tasks:
  - Add a specification of the CSV file and three new graph pages to Tor
 Metrics. I'll take care of this.
  - Import the descriptor archive since 2008 somewhere, though not
 necessarily on the production system. I can take care of this, but after
 the first review round when it's clear whether the database schema can
 stay.
  - Find a way to test the `Database` class. I briefly tried testing it
 with an in-memory HSQLDB database and got it working to some extent. But
 we're using a few features that are specific to PostgreSQL and that we'll
 have to replace in these tests. The result would be that we're testing
 something slightly different that is similar to the PostgreSQL database
 but not quite the same. And the code in `Database` looks trivial enough to
 not contain the major bugs. I think I'd prefer to test the whole code with
 real descriptors as input and a real test PostgreSQL database to do the
 aggregation. Let's try to find a testing approach that we can later apply
 to other modules. (This shouldn't block either review or deployment.)
  - Write a specification of the new CSV file according to what we said
 we'll do for
 [https://trac.torproject.org/projects/tor/wiki/org/sponsors/Sponsor13
 Sponsor 13].

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24218#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list