Mon Feb 20 15:00:27 UTC 2017

#21515: Add auxiliary data on Tor relays and bridges to CollecTor
     Reporter:  karsten            |      Owner:  metrics-team
         Type:  enhancement        |     Status:  new
     Priority:  Medium             |  Milestone:
    Component:  Metrics/CollecTor  |    Version:
     Severity:  Normal             |   Keywords:
Actual Points:                     |  Parent ID:
       Points:                     |   Reviewer:
      Sponsor:                     |
 This ticket is the result of a local TODO list review and combines a few
 related ideas.  Some of the ideas here are new, but some are really old
 and have been sitting on my list forever.

 The general idea here is that CollecTor could provide auxiliary data on
 Tor relays and bridges.  The main goal would be that other applications
 like Onionoo and Metrics but also Nyx can use this data to provide richer
 information on relays and bridges to their users.  A secondary goal would
 be that CollecTor would serve as an archive for this data for future
 applications that don't exist yet.

 Auxiliary data might include:

  1. GeoIP country database: This is the same data as the Tor daemon uses
 internally to resolve relay IP addresses to country codes.  We would be
 able to produce historical data by extracting `src/config/geoip` files
 from the Tor daemon Git repository.  This data could be used by Metrics to
 bring back the relays by country graph.

  2. GeoIP city database: This data would be the same as Onionoo uses to
 resolve relay IP addresses to city names.  The main advantage of having
 this file in CollecTor would be that Onionoo could automatically pull this
 data instead of relying on the operator to update GeoIP files.

  3. GeoIP ASN database: This is similar to 2 but for ASN information.

  4. Bridge GeoIP country database: Here's an idea to provide country
 information for bridges despite replacing IP addresses by hashes.
 CollecTor could keep a list of all bridge IP addresses in a given month
 and use the GeoIP country database from 1 to produce a custom database for
 resolving bridge IP addresses to country codes.  Basically, that database
 would contain hashed fingerprints, 10.x.y.z IP addresses, and country
 codes.  CollecTor would add a new line to this file whenever it observes a
 new bridge IP address, which would happen once per hour in particular at
 the beginning of a month.  This file would change once per month when
 hashes for 10.x.y.z addresses change.  However, this means that we'd have
 to reprocess the entire bridge tarball archive to generate older database
 files, because we have long deleted the inputs for generating those old
 10.x.y.z IP addresses.  Consumers of this data would be Onionoo but also
 Metrics for a new bridge country graph.

  5. Relay reverse DNS entries: Right now, Onionoo runs its own rDNS
 resolver.  But we could as well run that as part of CollecTor and provide
 the output data in a new data format to everyone who needs it.  There
 would also be other consumers of this data, including the relay controller
 Nyx which would be display rDNS entries without risking to leak who is
 fetching that information.

 This is a lot, but maybe there's even more.  It's probably useful to
 discuss these different new data sets together.  Once we decide we want to
 provide some or even all of them we should switch to child tickets.  And
 just to set expectations right, it's probably going to take months to find
 enough time to implement these new data sets, if we think it's a good

Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21515>
