[tor-bugs] #20548 [Metrics]: Handle bad input more consistently in metrics code bases

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Nov 7 08:13:52 UTC 2016


#20548: Handle bad input more consistently in metrics code bases
-------------------------+---------------------
 Reporter:  karsten      |          Owner:
     Type:  enhancement  |         Status:  new
 Priority:  Medium       |      Milestone:
Component:  Metrics      |        Version:
 Severity:  Normal       |     Resolution:
 Keywords:               |  Actual Points:
Parent ID:               |         Points:
 Reviewer:               |        Sponsor:
-------------------------+---------------------

Comment (by iwakeh):

 Some thoughts:

 One step is unifying the parsing process by replacing all parsing code
 with metrics-lib provided parsing (which is already under way for
 CollecTor).  This addresses goal number one in the description above.

 Goal number two (of the bullet point list in the description above) is
 fine, too, as descriptors are separate data units and failure of parsing
 one should not influence parsing and storing of subsequent descriptors
 only because these happened to be stored in the same file temporarily.

 Regarding the second list: privacy and client expectation, i.e. topics 3.
 and 4., are the most important.

 One way to combine storing-of-all-that-is-seen with privacy and client
 expectation, would be to store invalid descriptors separately.  The
 separate location also can be public for relay descriptors and sanitized
 bridge descriptors,i.e., public folders for download would be 'archive',
 'relay', and 'substandard' (or some better name).  All bridge descriptors
 that cannot be sanitized should be stored too, but not yet be offered to
 the public.

 Advantages:
 * privacy is ensured
 * clients can choose the quality of descriptors they're interested in
 * we'd get an overview of how many 'bad' descriptors show up every month
 and can analyze them
 * others can also analyze the 'substandard' descriptors, too, or use them,
 if they choose to.
 * Given that descriptors are not supposed to be altered other than for
 privacy reasons, some still could be later integrated into the 'normal'
 archives for example when more robust parsing is available.

 Disadvantages:
 * implementation of the third storage (alover, i.e. for 'recent', 'out',
 and 'substandard'), but the implementation should be easy.
 * maintenance of third storage location.

 Concerning already archived data there are two options:
 * leave them as thy are
 * or re-parse and sort substandard historic descriptors into tarballs in
 the 'substandard' directory.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20548#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list