[tor-bugs] #8255 [Stem]: Parse exit list entry

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Aug 21 16:32:02 UTC 2013


#8255: Parse exit list entry
-------------------------+--------------------------------------------------
 Reporter:  atagar       |          Owner:  atagar        
     Type:  enhancement  |         Status:  needs_revision
 Priority:  minor        |      Milestone:                
Component:  Stem         |        Version:                
 Keywords:               |         Parent:  #8252         
   Points:               |   Actualpoints:                
-------------------------+--------------------------------------------------
Changes (by atagar):

  * status:  needs_review => needs_revision


Comment:

 Hi arlolra, sorry about the delay and thanks for the patch!

 This looks fantastic, only a few minor questions and comments...

 {{{
 +"""
 +Parsing for TorDNSEL files.
 +
 +Defined in https://www.torproject.org/tordnsel/exitlist-spec.txt
 +"""
 }}}

 Thanks for including the link. However, we probably just need it once and
 it's already in TorDnsel's pydocs so just only include it there.

 {{{
 +  _read_until_keywords("ExitNode", tordnsel_file, skip = True)
 +  while True:
 +    contents = _read_until_keywords("ExitAddress", tordnsel_file)
 +    contents += _read_until_keywords("ExitNode", tordnsel_file)
 +    if contents:
 +      yield TorDnsel(bytes.join(b"", contents), validate, **kwargs)
 +    else:
 +      break  # done parsing file
 }}}

 This will skip anything prior to the first 'ExitNode'. The TorDNSEL spec
 is a tad less formal than other spec types so this is probably appropriate
 though in most other cases extra content like this is disallowed.

 The spec says entries are...

 {{{
 (ExitNode | Published | LastStatus | ExitAddress*)*
 }}}

 Which indicates seems to indicate that the ExitAddress appears zero or
 more times. Won't the above have problems if ExitAddress doesn't appear
 for an entry?

 {{{
 +class TorDnsel(Descriptor):
 }}}

 Mind if we rename this class to TorDNSEL?

 {{{
 +  :var list exit_addresses: list of (str address, datetime date) tuples
 consisting of the found exit address and the time
 }}}

 This should have an asterisk too since the list will always exist (it has
 a default value of being empty), and lets also mention that it's IPv4
 addresses.

 {{{
 +    for keyword, values in entries.items():
 +      value, _ = values[0]
 }}}

 Lets balk with a validation error if we get any block content.

 {{{
 +      elif keyword == "ExitAddress":
 +        for value, _ in values:
 +          address, date = value.split(" ", 1)
 +          if validate and not
 stem.util.connection.is_valid_ipv4_address(address):
 +            raise ValueError("ExitAddress isn't a valid IPv4 address: %s"
 % address)
 +          try:
 +            date = datetime.datetime.strptime(date, "%Y-%m-%d %H:%M:%S")
 +          except ValueError:
 +            if validate:
 +              raise ValueError("ExitAddress found time wasn't parsable:
 %s" % value)
 +          self.exit_addresses.append((address, date))
 }}}

 This worries me a little bit because if 'validate' is False then the date
 we append could be either a datetime or str. Lets continue (skip the
 entry) when date can't be parsed and validate is False.

 {{{
 +    descriptors = list(_parse_file(io.BytesIO(desc_text)))
 +    self.assertEqual(3, len(descriptors))
 +    self.assertTrue(isinstance(descriptors[0], TorDnsel))
 +    self.assertEqual(3, len(descriptors[1].exit_addresses))
 }}}

 Would you mind some assertions about the parsed values? Also, a test or
 two with invalid content would be nice. For instance, we should probably
 have a test that parses content without any ExitAddress considering the
 issue mentioned earlier.

 Cheers! -Damian

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/8255#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list