[tor-bugs] #2506 [Tor Relay]: Design and implement a more compact GeoIP file format

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Sun Jun 24 04:42:20 UTC 2012


#2506: Design and implement a more compact GeoIP file format
-------------------------+--------------------------------------------------
 Reporter:  rransom      |          Owner:  endian7000        
     Type:  enhancement  |         Status:  assigned          
 Priority:  normal       |      Milestone:  Tor: 0.2.4.x-final
Component:  Tor Relay    |        Version:                    
 Keywords:               |         Parent:                    
   Points:               |   Actualpoints:                    
-------------------------+--------------------------------------------------

Comment(by rransom):

 Replying to [comment:19 dcf]:
 > I tried to implement geoip lookup using a
 [https://en.wikipedia.org/wiki/Binary_decision_diagram binary decision
 diagram] (BDD) rather than a binary search over ranges. With a BDD, you
 look at a bit of the address, and depending on whether it is a 0 or 1 you
 jump to another node in a dag to look at another bit, terminating when you
 get to a country code. In summary: it works, but doesn't save much space.
 (One should publish negative results, right?)

 Yes.

 > The {{{geoip}}} database of June 6 2012 has 168366 ranges and is 4.0 MB
 on disk. A reduced BDD computing the same mapping ({{{geoip-rdd.bdd}}})
 has 185170 nodes and is 3.6 MB on disk. A range could reasonably take 10
 bytes (int32 ip_low, int32 ip_high, 2-byte country code) for a total of
 about 1.6 MB in memory; and a BDD node could reasonably fit in 8 bytes
 (int32 lo pointer and int32 hi pointer, stealing the high 5 bits of a node
 index for a bit number) for a total of about 1.4 MB in memory.

 What if you use variable-size integers for the if-0 and if-1 offsets, and
 store them as offsets from (the end of) the current node?

 > The size of a BDD depends on its variable ordering. I only tried looking
 at bits from most-to-least significant. My intuition says that should be
 close to the best ordering, because of how addresses are allocated. I
 didn't test any others, partly because ranges aren't contiguous using any
 other ordering, complicating the construction of the BDD.

 If you use a non-fixed ordering, you will probably waste space by having
 to store a bit index in every node.

 (A BDD with fixed bit order is a ‘crit-bit tree’ or ‘PATRICIA tree’ (there
 are other names).  See also [http://cr.yp.to/critbit.html].)

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2506#comment:20>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list