[tor-bugs] #6569 [Stem]: Stem network status document parsing

Mon Aug 13 20:27:46 UTC 2012

#6569: Stem network status document parsing
--------------------+-------------------------------------------------------
 Reporter:  neena   |          Owner:  neena       
     Type:  task    |         Status:  needs_review
 Priority:  normal  |      Milestone:              
Component:  Stem    |        Version:              
 Keywords:          |         Parent:              
   Points:          |   Actualpoints:              
--------------------+-------------------------------------------------------

Comment(by atagar):

 Hi Ravi. Just some first thoughts so far from my bus ride...

 > +    file_parser = lambda f: stem.descriptor.networkstatus.parse_file(f,
 True, "microdesc")

 Very neat, I've never seen this trick before. I like it.

 > +      desc =
 stem.descriptor.networkstatus.NetworkStatusDocument(descriptor_file.read())

 Again this is frontloading the read rather than passing the file object
 itself to the NetworkStatusDocument. This means that the memory usage is,
 at a minimum, as large as the whole consensus file. If we process
 individual consensus entries then that'll be reduced to the size of an
 individual consensus entry.

 I realize that this is a large change but it's likely necessary to avoid
 swapping on some systems.

 > +  if " " in line:
 > +    keyword = line.split(" ", 1)[0]
 > +    if keyword == "opt":
 > +        keyword = line.split(" ", 2)[1]
 > +  else: keyword = line.strip()

 I probably would have done the same though, on reflection, this can
 actually be reduced to...

 {{{
 if line.startswith("opt "):
   line = line[4:]

 keyword = line.split(" ", 1)[0].strip()
 }}}

 ... this is because split doesn't need the whitespace to have a first
 element...

 {{{
 >>> line = "foobar"
 >>> line.split(" ", 1)[0]
 'foobar'
 }}}

 > +try:
 > +  from cStringIO import StringIO
 > +except:
 > +  from StringIO import StringIO

 Out of curiosity when is cStringIO unavailable? Iirc we're using it
 elsewhere too so if it, say, isn't available on python 2.5 then that would
 be good to know.

 > +Flag = stem.util.enum.Enum(
 > +  ("AUTHORITY", "Authority"),
 > +  ("BADEXIT", "BadExit"),
 > +  ("EXIT", "Exit"),
 > +  ("FAST", "Fast"),
 > +  ("GUARD", "Guard"),
 > +  ("HSDIR", "HSDir"),
 > +  ("NAMED", "Named"),
 > +  ("RUNNING", "Running"),
 > +  ("STABLE", "Stable"),
 > +  ("UNNAMED", "Unnamed"),
 > +  ("V2DIR", "V2Dir"),
 > +  ("VALID", "Valid"),
 > +  )
 > +
 > +Flag = stem.util.enum.Enum(*[(flag.upper(), flag) for flag in
 ["Authority", "BadExit", "Exit", "Fast", ...

 Duplicate assignment. Personally I think the first is more readable though
 up to you. If you like the second better then please put the flag list
 into a temporary variable first to avoid having a single enormous line.

 Also, minor nitpick: when doing a multi-line block please place the last
 parentheses at the same indentation as the block. Ie...

 {{{
 Flag = stem.util.enum.Enum(
   ("AUTHORITY", "Authority"),
   ...
 )
 }}}

 ... mostly just for consistency and to give a little better visual que
 that it's the end of the block.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6569#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online