[tor-bugs] #6471 [Metrics Utilities]: Design file format and Python/Java library for multiple GeoIP or AS databases

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Nov 6 18:29:41 UTC 2012


#6471: Design file format and Python/Java library for multiple GeoIP or AS
databases
-------------------------------+--------------------------------------------
 Reporter:  karsten            |          Owner:                
     Type:  enhancement        |         Status:  needs_revision
 Priority:  normal             |      Milestone:                
Component:  Metrics Utilities  |        Version:                
 Keywords:                     |         Parent:                
   Points:                     |   Actualpoints:                
-------------------------------+--------------------------------------------
Changes (by atagar):

  * status:  needs_review => needs_revision


Comment:

 > def address_ston(address_string):
 >   try:
 >     address_struct = socket.inet_pton(socket.AF_INET, address_string)
 >   except socket.error:
 >     raise ValueError
 >   return struct.unpack('!I', address_struct)[0]

 I'm sure that you don't want to add a dependency just for this sort of
 functionality, but just a fyi that I have a few IP utilities that you
 might find to be helpful...

 https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/util/connection.py

 I wrote them to support exit policies, in particular checking if a given
 endpoint falls under a particular IP/mask...

 https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/exit_policy.py

 The get_address_binary() can do something similar to what you have here...

 {{{
 >>> import socket
 >>> import struct
 >>> address_struct = socket.inet_pton(socket.AF_INET, "127.0.0.1")
 >>> struct.unpack('!I', address_struct)[0]
 2130706433

 >>> from stem.util import connection
 >>> address_bin =  connection.get_address_binary("127.0.0.1")
 >>> print address_bin
 01111111000000000000000000000001
 >>> int(address_bin, 2)
 2130706433
 }}}

 > def __str__(self):
 >   return "%s,%s,%s,%s,%s" % \
 >         (Database.address_ntos(self.start_address),
 >          Database.address_ntos(self.end_address),
 >          self.code,
 >          Database.date_ntos(self.start_date),
 >          Database.date_ntos(self.end_date))

 Any advantage to this verses just saving the 'line' argument we were
 constructed from?

 > def date_ston(date_string):
 > def address_ntos(address):
 > def date_kton(key):

 I haven't a clue what any of these acronyms mean. My understanding is that
 shortening function names to some arcane, overly cramped abbreviation is
 an artifact of old-time C development where saving every byte of space
 mattered. Mind coming up with more descriptive names?

 > return int(date_datetime.strftime('%s')) / 86400

 It took me a second to figure out where the 86400 came from. You might
 want to comment that.

 > for line in input_file.readlines():

 File objects themselves are iterable over the lines. I suspect that
 calling readlines() here is creating a defensive copy with a list of
 lines, so this causes you to read the file into memory twice (both for
 this list and what we add while processing it).

 {{{
 >>> with open('/tmp/foo') as input_file:
 ...   for line in input_file:
 ...     print line.strip()
 ...
 pepperjack is
 very tasty cheese!
 }}}

 Cheers! -Damian

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6471#comment:16>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list