[tor-bugs] #6471 [Metrics Utilities]: Design file format and Python/Java library for multiple GeoIP or AS databases

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Nov 6 15:37:02 UTC 2012


#6471: Design file format and Python/Java library for multiple GeoIP or AS
databases
-------------------------------+--------------------------------------------
 Reporter:  karsten            |          Owner:              
     Type:  enhancement        |         Status:  needs_review
 Priority:  normal             |      Milestone:              
Component:  Metrics Utilities  |        Version:              
 Keywords:                     |         Parent:              
   Points:                     |   Actualpoints:              
-------------------------------+--------------------------------------------
Changes (by karsten):

 * cc: atagar (added)
  * status:  assigned => needs_review


Comment:

 Okay, I think I got this into a needs_review state.  gsathya, atagar, if
 you don't mind, please review the Python part and help me turn this code
 into a simple metrics utility library.

 Here's what the code does:

 The [https://gitweb.torproject.org/metrics-
 tasks.git/tree/HEAD:/task-6471/java Java code] combines GeoIP or ASN
 databases and produces a single combined database file.  That file has
 lines like the following:

 {{{
 223.255.254.0,223.255.254.255,AS55415,20110501,20121101
 223.255.247.0,223.255.247.255,AS45954,20101101,20121101
 }}}

 The first line means that addresses `223.255.254.0` to `223.255.254.255`
 were assigned to `AS55415` in the databases published from `20110501` to
 `20121101`.

 The [https://gitweb.torproject.org/metrics-
 tasks.git/tree/HEAD:/task-6471/python Python code] reads a combined
 database file and answers questions to which country or AS a given IP
 address was assigned on a given date.  It uses Python's lists and the
 bisect module to get at least anywhere close to Java's performance.

 I created two example combined database files for Maxmind's
 [https://people.torproject.org/~karsten/volatile/city-2009-06-2012-10.csv.bz2
 GeoLiteCity] and
 [https://people.torproject.org/~karsten/volatile/asn-2005-09-2012-11.csv.bz2
 ASN databases].

 Having date-based ip-to-country or ip-to-asn lookups is relevant for
 #6232, among other things.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6471#comment:14>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list