[tor-bugs] #20080 [Metrics/CollecTor]: Make a plan for updating the bridgedescs module

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Sep 6 09:55:10 UTC 2016


#20080: Make a plan for updating the bridgedescs module
-----------------------------------+-----------------
     Reporter:  karsten            |      Owner:
         Type:  task               |     Status:  new
     Priority:  Medium             |  Milestone:
    Component:  Metrics/CollecTor  |    Version:
     Severity:  Normal             |   Keywords:
Actual Points:                     |  Parent ID:
       Points:                     |   Reviewer:
      Sponsor:                     |
-----------------------------------+-----------------
 I have a long list of pending changes to the bridgedescs module, and I'd
 like to discuss how to line them up and apply them with as few disruptions
 as possible.  Here's the list:

  - The reprocessing of archived bridge descriptor tarballs to sanitize TCP
 ports (#19317) is moving forward.  All tarballs until 2016-05 are
 reprocessed and I compared a sample of about 5% of newly sanitized
 descriptors to previously sanitized descriptors to ensure that results are
 correct.  I'm currently `tar`'ing them up and `xz`-compressing them, which
 will take another week or so.  When this is done, I'll have to reprocess
 2016-06 to 2016-09, which would take another week.  And I'll have to
 deploy this new code on the main CollecTor instance, ideally as a second
 instance running on the same host running in parallel until all archives
 are reprocessed.

  - We should take this opportunity of reprocessing bridge descriptors to
 also repackage them into one tarball per month and descriptor type.  For
 example, `bridge-descriptors-2016-09.tar.xz` would be split up into
 `bridge-statuses-2016-09.tar.xz`, `bridge-server-
 descriptors-2016-09.tar.xz`, and `bridge-extra-infos-2016-09.tar.xz`.
 This may require some changes to paths, as well as changes to the `create-
 tarballs` script.  Blocking on reprocessed archives.

  - I still have a branch with unfinished unit tests, some of which
 uncovering unfixed minor bugs that won't be triggered as long as input
 from the bridge authority is trusted (like #20044).  Should happen before
 making any non-hotfix changes to the code.

  - The module seriously needs to be refactored into a more reasonable
 class structure and smaller, more testable methods (like #19755, but also
 #19621).  Not urgent, but should happen before we need to make the next
 non-hotfix change.

  - At some point we should rethink how we handle issues while sanitizing
 bridge descriptors (#19834).  Not urgent.

  - I made a few tweaks to the bridgedescs module to make it possible to
 reprocess large batches of tarballs without running out of memory (#19778)
 or unnecessarily wasting processing time.  These changes would be harmful
 for regular operation, so I wonder if we should add a "batch processing"
 configuration option to enable them.  Not urgent.

 Can we make a plan when to make/apply these changes, either on Trac or
 this weekend in Berlin?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20080>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list