[tor-bugs] #19170 [Metrics/CollecTor]: make parsing more robust (extra-info)

Mon Aug 1 19:16:14 UTC 2016

#19170: make parsing more robust (extra-info)
-------------------------------+--------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  defect             |         Status:  accepted
 Priority:  Medium             |      Milestone:
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+--------------------------

Comment (by karsten):

 Replying to [comment:9 iwakeh]:
 > Replying to [comment:8 atagar]:
 > > > Another question that would need to be investigated: how will
 CollecTor clients deal with the additional non-compliant data?
 > >
 > > This seems an odd question. CollecTor serves tarballs of the published
 descriptor data. If the authorities publish it then CollecTor should
 provide it, malformed or not.

 There may be special cases where that statement doesn't hold, but in
 general, I agree that CollecTor should aim for providing all data that the
 directory authorities published.

 > Of course, clients should deal with the data as it is collected.
 Currently, they are 'shielded' from non-conformant extra-info decriptors,
 b/c CollecTor drops them. After the change some might trip over that newly
 available data. I intended to find out what the change would trigger, for
 example what additional work we'd have with clients like Onionoo etc.

 Onionoo et al. shouldn't be affected, because they're using metrics-lib to
 parse descriptors which shields them from malformed descriptors.  Other
 clients not using metrics-lib might be affected, but those clients would
 also break when parsing Tor data directly, so I don't think that we have
 to take special care there.

 Regarding the `LenientParser` idea, I wonder whether we should just skip
 the metrics-lib check to see whether we can parse a descriptor before
 writing it to disk.  See `ArchiveWriter#store()`.  At that point we
 already parsed all relevant fields that we need for storing the descriptor
 without using metrics-lib, and that check is only there to make sure that
 metrics-lib will be able parse the descriptor later.  But if we want to
 take that check out, which I think we should, then let's just change that
 code to print out an informational log statement and store the file
 anyway.  What do you think?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/19170#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online