[metrics-bugs] #23243 [Metrics/Website]: write a spec for web-server-access log descriptors

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Sep 13 10:02:58 UTC 2017

#23243: write a spec for web-server-access log descriptors
 Reporter:  iwakeh           |          Owner:  metrics-team
     Type:  enhancement      |         Status:  needs_information
 Priority:  Medium           |      Milestone:
Component:  Metrics/Website  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:                   |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:

Comment (by karsten):

 Replying to [comment:34 iwakeh]:
 > There are two open questions:
 > 1. Should it be mentioned in section 2 of the spec that log files come
 in directories named as the physical host, i.e.,

 Wait, there's no `.log` at the end of the file name. Example (from the


 Also note the `-` between `access.log` and the date.

 > 2. As already visible in 1.: the files are expected to have ending
 '.log' or '.log.bz2' or some other compression?
 > Especially a clear answer for 2. is important for the implementation.

 I'd say the exact compression type is an implementation detail. See also
 the very last paragraph in the spec where we said: "Sanitized log files
 are typically compressed before publication. In particular the sorting
 step allows for highly efficient compression rates. We typically use XZ
 for compression, which is indicated by appending ".xz" to log file names,
 but this is subject to change." -- We could say something similar for logs
 that are provided to the sanitizer.

 How about we add a new first paragraph to Section 3.1 (Discarding non-
 matching files):

 Log files are made available to the santizer in a separate directory per
 physical web server host. Log files are typically gz-compressed, which is
 indicated by appending ".gz" to log file names, but this is subject to
 change. Overall, the sanitizer expects log files to use the following path


 And while we're at it, let's change "''<hostname>''.torproject.org-access
 .log-YYYYMMDD" in the last paragraph of Section 2 to "''<virtual-

Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23243#comment:35>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online

More information about the metrics-bugs mailing list