[metrics-bugs] #23243 [Metrics/Metrics website]: write a spec for web-server-access log descriptors

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Aug 15 09:38:05 UTC 2017


#23243: write a spec for web-server-access log descriptors
-----------------------------------------+--------------------------
     Reporter:  iwakeh                   |      Owner:  metrics-team
         Type:  enhancement              |     Status:  new
     Priority:  Medium                   |  Milestone:
    Component:  Metrics/Metrics website  |    Version:
     Severity:  Normal                   |   Keywords:
Actual Points:                           |  Parent ID:
       Points:                           |   Reviewer:
      Sponsor:                           |
-----------------------------------------+--------------------------
 This document should answer the following questions:

 * What will the raw input data look like?
  - compressed logs
  - varying dates in log-lines despite the file being tagged with a single
 date
  - are there only GET log-lines of 200 responses to be expected?
  - size could be huge (in future)
  - exact input format (if possible to define)
  - meta-data is provided in paths and filenames
  - ...
 * What will sanitized stored (on disk) logs look like?
  - cleaned log-lines, define exact format, give examples (as this might
 deviate from the current python sanitation)
  - meta-data is provided in paths and filenames
  - should files be reassembled, i.e., only log lines of a given date in a
 descriptor for that log date?
  - should storage (on disk) be in compressed files (opposed to storing
 other descriptors uncompressed)?
  - Should such log be stored (on disk) in reasonably sized chunks (once a
 GB size is reached)?
  - ...

 Please add more.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23243>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list