[tor-bugs] #19834 [Metrics/CollecTor]: Rethink how we handle issues while sanitizing bridge descriptors

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Aug 4 20:13:53 UTC 2016


#19834: Rethink how we handle issues while sanitizing bridge descriptors
-----------------------------------+-----------------
     Reporter:  karsten            |      Owner:
         Type:  enhancement        |     Status:  new
     Priority:  Low                |  Milestone:
    Component:  Metrics/CollecTor  |    Version:
     Severity:  Normal             |   Keywords:
Actual Points:                     |  Parent ID:
       Points:                     |   Reviewer:
      Sponsor:                     |
-----------------------------------+-----------------
 The bridge descriptor sanitizer parses tarballs containing non-sanitized
 bridge descriptors, modifies their content by removing bridge IP addresses
 and other sensitive parts, and writes sanitized versions of those bridge
 descriptors to disk.

 The sanitizer needs to recognize the lines contained in bridge descriptors
 to distinguish between lines that must be changed and others that can be
 kept unchanged, and it needs to be able to understand the exact format of
 certain lines in order to sanitize their contents.

 This process can go wrong in various ways, and we need to decide how to
 handle those situations.  Possible situations are:

  1. A tarball is malformed or can otherwise not be opened.
  2. A tarball contains one or more files that cannot be opened.
  3. A tarball file contains an unknown descriptor type.
  4. An internal problem prohibits sanitizing descriptor parts (e.g.,
 missing secret for sanitizing IP address).
  5. A descriptor is missing parts that are required for properly
 sanitizing its contents.
  6. A descriptor contains an unrecognized line.
  7. A descriptor line doesn't follow the expected format, contains fewer
 or more arguments, etc.

 Possible ways of handling such situations are:

  A. Skip a line we don't understand and keep the rest of the descriptor.
  B. Skip a descriptor.
  C. Skip the file contained in the tarball and continue with the next.
  D. Abort processing the tarball.
  E. Skip the entire tarball, including discarding any descriptors
 processed before running into the problem, and attempt to process the
 tarball again in the next execution.
  F. Abstain from processing a given descriptor type until a problem has
 been resolved.
  G. Discard any descriptors processed in a tarball until running into the
 problem, abort the current execution, and refuse starting the next
 execution until the problem has been resolved.
  H. (in addition to A-G). Inform the operator by logging the problem.
  I. (in addition to A-G). Warn the operator and ask them to resolve the
 problem.

 Looking at this list, I think that my preferred ways of handling problems
 would be something like:

  - B+H in situations 5, 6, and 7;
  - E+I in situations 1, 2, and 3; and
  - G+I in situation 4.

 That's not exactly what we're currently doing.  And I'm not even sure if
 somebody else operating a CollecTor instance with the bridgedescs module
 would have the same preferences.

 Let's discuss!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/19834>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list