[metrics-bugs] #32890 [Metrics/CollecTor]: Remember processed files between module runs

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Jan 7 15:18:26 UTC 2020

#32890: Remember processed files between module runs
     Reporter:  karsten            |      Owner:  karsten
         Type:  defect             |     Status:  assigned
     Priority:  Medium             |  Milestone:
    Component:  Metrics/CollecTor  |    Version:
     Severity:  Normal             |   Keywords:
Actual Points:                     |  Parent ID:
       Points:                     |   Reviewer:
      Sponsor:                     |
 The three recently added modules to archive Snowflake statistics, bridge
 pool assignments, and BridgeDB metrics have in common that they process
 any input files regardless of whether they already processed them before.

 The problem is that the input files processed by these modules are either
 never removed (Snowflake statistics) or only removed manually by the
 operator (bridge pool assignments and BridgeDB statistics).

 The effect is that non-recent BridgeDB metrics and bridge pool assignments
 are being placed in the indexed/recent/ directory in the next execution
 after they are deleted for being older than 72 hours. The same would
 happen with Snowflake statistics after the operator removes them from the
 out/ directory.

 The fix is to use a state file containing file names of previously
 processed files and only process a file not found in there. This is the
 same approach as taken for bridge descriptor tarballs.

Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32890>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online

More information about the metrics-bugs mailing list