[tor-bugs] #6471 [Metrics Utilities]: Design file format and Python/Java library for multiple GeoIP or AS databases

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Nov 7 15:53:49 UTC 2012


#6471: Design file format and Python/Java library for multiple GeoIP or AS
databases
-------------------------------+--------------------------------------------
 Reporter:  karsten            |          Owner:              
     Type:  enhancement        |         Status:  needs_review
 Priority:  normal             |      Milestone:              
Component:  Metrics Utilities  |        Version:              
 Keywords:                     |         Parent:              
   Points:                     |   Actualpoints:              
-------------------------------+--------------------------------------------

Comment(by atagar):

 > octets = address_string.split('.')
 > return long(''.join(["%02X" % long(octet) for octet in octets]), 16)

 Nice. :)

 > Are you certain that the current code leads to reading the file into
 memory twice?

 Pretty sure. What makes you think that it doesn't?

 Calling 'file.readlines()' reads the whole file into a list of newline
 separated strings. At this point you've read the whole file once into
 memory, and then you iterate over the entires and append data for each
 entry.

 This is a bit similar to the difference between python's range() and
 xrange() function. Calling range() gives you a list (which is iterable)
 while xrange() gives you an iterator. Hence...

 {{{
 for i in range(1000000000):
   print i
 }}}

 ... means making a list of a billion ints in memory then printing each
 while...

 {{{
 for i in xrange(1000000000):
   print i
 }}}

 ... has constant memory usage because xrange() provides the sequence on
 demand.

 Personally I think that it's stupid that the python compiler isn't smart
 enough to say "the developer's using range() or readlines() in a loop,
 hence provide an iterator rather than a list", and maybe it does in newer
 python versions. I wouldn't count on it though.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6471#comment:20>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list