[tor-bugs] #2506 [Tor Relay]: Design and implement a more compact GeoIP file format

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Tue Jun 7 00:11:59 UTC 2011


#2506: Design and implement a more compact GeoIP file format
-------------------------+--------------------------------------------------
 Reporter:  rransom      |          Owner:  endian7000        
     Type:  enhancement  |         Status:  needs_review      
 Priority:  normal       |      Milestone:  Tor: 0.2.3.x-final
Component:  Tor Relay    |        Version:                    
 Keywords:               |         Parent:                    
   Points:               |   Actualpoints:                    
-------------------------+--------------------------------------------------

Comment(by nickm):

 Replying to [comment:9 rransom]:
 > Replying to [comment:7 nickm]:
 > > I'm not so sure that having this stuff in separate run-length and cc
 files will actually be needed; endianness issues will keep us from reading
 any portable file into an array-of-country verbatim, I think.
 >
 > The country codes are two-character ASCII strings, and are thus
 endianness-independent.  The run lengths are integers, but could be
 encoded in big-endian form everywhere.

 I thought that the whole point of endian7000's idea was that a lot of the
 savings came from variable-length run-length encoding. In the database I'm
 looking at, there are 4212 distinct run-length encodings. Lots of the win
 comes from encoding the more frequent run-lengths as a single byte and the
 less frequent ones as two bytes.

 To quantify: 136810 of the runs in my geoip file would have their lengths
 represented as one byte in the var-length encoding, whereas 11586 would
 take two bytes.  Using a fixed-width two-byte encoding for run lengths
 would add another 133K to the file size.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2506#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list