[tor-bugs] #6471 [Metrics Utilities]: Design file format and Python/Java library for multiple GeoIP or AS databases

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Sun Sep 9 03:19:27 UTC 2012


#6471: Design file format and Python/Java library for multiple GeoIP or AS
databases
-------------------------------+--------------------------------------------
 Reporter:  karsten            |          Owner:  karsten     
     Type:  enhancement        |         Status:  needs_review
 Priority:  normal             |      Milestone:              
Component:  Metrics Utilities  |        Version:              
 Keywords:                     |         Parent:              
   Points:                     |   Actualpoints:              
-------------------------------+--------------------------------------------

Comment(by rransom):

 Replying to [comment:9 karsten]:
 > Replying to [comment:8 rransom]:
 > > #2506 has a link to possibly-useful Python code.
 >
 > Maybe, yes.  Interesting discussion there!  The focus is somewhat
 different though.  I'd sacrifice compact representation of the data
 structure holding multiple GeoIP databases for better lookup speed.  We
 might store dozens of GeoIP databases in a single structure for metrics,
 and we'll do bulk lookups.  If that takes tens of MiB on disk and a
 multiple of that in RAM, but therefore finishes within a few minutes, not
 hours, I'm okay with that.

 If you have enough RAM to burn on using your programming language's
 general-purpose data structures, and your disks are fast enough to load
 the data from text files, go for it.  (General-purpose compression of the
 files on disk would be helpful if you're bottlenecked by disk IO.)  But do
 keep the data structures in #2506 in mind in case you need to reduce the
 load on whatever hardware you're using.

 In particular, a ‘crit-bit tree’ (see [ticket:2506#comment:20] and
 [ticket:2506#comment:19]) should provide very fast in-memory lookup, if
 you can implement it in a sufficiently fast language (i.e. ''not''
 Python).

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6471#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list