[tor-bugs] #2506 [Tor Relay]: Design and implement a more compact GeoIP file format

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Sun Jun 24 04:19:39 UTC 2012


#2506: Design and implement a more compact GeoIP file format
-------------------------+--------------------------------------------------
 Reporter:  rransom      |          Owner:  endian7000        
     Type:  enhancement  |         Status:  assigned          
 Priority:  normal       |      Milestone:  Tor: 0.2.4.x-final
Component:  Tor Relay    |        Version:                    
 Keywords:               |         Parent:                    
   Points:               |   Actualpoints:                    
-------------------------+--------------------------------------------------
Changes (by dcf):

 * cc: dcf@… (added)


Comment:

 I tried to implement geoip lookup using a
 [https://en.wikipedia.org/wiki/Binary_decision_diagram binary decision
 diagram] (BDD) rather than a binary search over ranges. With a BDD, you
 look at a bit of the address, and depending on whether it is a 0 or 1 you
 jump to another node in a dag to look at another bit, terminating when you
 get to a country code. In summary: it works, but doesn't save much space.
 (One should publish negative results, right?) Code is available from
 {{{git clone http://www.bamsoftware.com/git/geoip-bdd.git}}}.

 The {{{geoip}}} database of June 6 2012 has 168366 ranges and is 4.0 MB on
 disk. A reduced BDD computing the same mapping ({{{geoip-rdd.bdd}}}) has
 185170 nodes and is 3.6 MB on disk. A range could reasonably take 10 bytes
 (int32 ip_low, int32 ip_high, 2-byte country code) for a total of about
 1.6 MB in memory; and a BDD node could reasonably fit in 8 bytes (int32 lo
 pointer and int32 hi pointer, stealing the high 5 bits of a node index for
 a bit number) for a total of about 1.4 MB in memory.

 The size of a BDD depends on its variable ordering. I only tried looking
 at bits from most-to-least significant. My intuition says that should be
 close to the best ordering, because of how addresses are allocated. I
 didn't test any others, partly because ranges aren't contiguous using any
 other ordering, complicating the construction of the BDD.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2506#comment:19>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list