[tor-bugs] #9380 [BridgeDB]: BridgeDB should use stem for parsing descriptors according to torspec

Tor Bug Tracker & Wiki blackhole at torproject.org
Sun Aug 24 22:29:17 UTC 2014


#9380: BridgeDB should use stem for parsing descriptors according to torspec
-------------------------+-------------------------------------------------
     Reporter:  sysrqb   |      Owner:  isis
         Type:           |     Status:  needs_revision
  enhancement            |  Milestone:
     Priority:  normal   |    Version:
    Component:           |   Keywords:  stem,bridgedb-0.2.x,bridgedb-
  BridgeDB               |  parsers
   Resolution:           |  Parent ID:
Actual Points:           |
       Points:           |
-------------------------+-------------------------------------------------

Comment (by atagar):

 Hi isis, here's a quick code review for the file you pointed me toward
 ([https://gitweb.torproject.org/user/isis/bridgedb.git/blob/refs/heads/fix/9380-stem_r3:/lib/bridgedb/parse/descriptors.py
 descriptors.py])...

 ----

 {{{
 if not descriptors.startswith("published"):
   precise = datetime.datetime.now().isoformat(sep=chr(0x20))
   timestamp = precise.rsplit('.', 1)[0]
   descriptors = "published {t}\n{d}".format(t=timestamp, d=descriptors)
 }}}

 Interesting, any reason to do 'sep=chr(0x20)' rather than 'sep=' !''?

 Regardless, this would be simpler as...

 {{{
 if not descriptors.startswith("published"):
   published_line = "published %s\n" %
 datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
   descriptor = published_line + descriptor
 }}}

 ----

 {{{
 routers = networkstatus.BridgeNetworkStatusDocument(descriptors,
   validate=validate)
 }}}

 This is something we've discussed several times but are you sure
 BridgeNetworkStatusDocument is what you want? Your earlier comment
 indicates that these files are simply a list of router status entries (no
 consensus header or footer). If that's the case then you could simplify
 this whole function to be somethng like...

 {{{
 def parseNetworkStatusFile(filename, validate=True):
   with open(filename) as descriptor_file:
     return list(stem.descriptor.router_status_entry._parse_file(
       descriptor_file,
       validate,
       entry_class =
 stem.descriptor.router_status_entry.RouterStatusEntryV2,
     ))
 }}}

 This will provide you a list of RouterStatusEntryV2 entries. I think you
 really want RouterStatusEntryV3, but your current code is providing you
 RouterStatusEntryV2 instances (maybe a bug).

 ----

 {{{
 routers = [router for router in document]
 return routers
 }}}

 This could also be...

 {{{
 return list(document)
 }}}

 ----

 {{{
 for descriptor in descriptors:
   router = descriptors.pop(descriptors.index(descriptor))
 }}}

 This has a couple issues...

  * You're modifying a list while you iterate over it. This is a bad idea
 in just about any language (iirc java would raise a
 ConcurrentModificationException).
  * Doing this is both pointless and very, very inefficient. You're doing a
 O(n) iteration over the descripters then doing a O(n) lookup to find the
 index then another O(n) modification. In other words this is O(n^2) twice.

 This would be both simpler and more efficient as...

 {{{
 while descriptors:
   router = descriptors.pop()
 }}}

 That said this whole function can simplified as...

 {{{
 def deduplicate(descriptors):
   # Dictionary of {fingerprint: descriptor} for the newest descriptor with
 a
   # given fingerprint.

   fp_to_descriptor = {}
   duplicates = []

   for desc in descriptors:
     if desc.fingerprint in fp_to_descriptor:
       conflict = fp_to_descriptor[desc.fingerprint]

       if desc.published > conflict.published:
         fp_to_descriptor[desc.fingerprint] = desc
         duplicates.append(conflict)
       else:
         duplicates.append(desc)
     else:
       fp_to_descriptor[desc.fingerprint] = desc

   return fp_to_descriptor.values()
 }}}

 ----

 {{{
 descriptors.extend([router for router in document])
 }}}

 Again, there's no reason to do list comprehension to just convert an
 iterable to a list. Calling...

 {{{
 [router for router in document]
 }}}

 ... is the same as...

 {{{
 list(document)
 }}}

 Cheers! -Damian

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9380#comment:16>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list