[metrics-bugs] #23243 [Metrics/Website]: write a spec for web-server-access log descriptors

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Sep 13 10:14:55 UTC 2017


#23243: write a spec for web-server-access log descriptors
-----------------------------+-----------------------------------
 Reporter:  iwakeh           |          Owner:  metrics-team
     Type:  enhancement      |         Status:  needs_information
 Priority:  Medium           |      Milestone:
Component:  Metrics/Website  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:                   |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+-----------------------------------

Comment (by iwakeh):

 Thanks for the quick reply!

 Replying to [comment:35 karsten]:
 > Replying to [comment:34 iwakeh]:
 > > There are two open questions:
 > >
 > > 1. Should it be mentioned in section 2 of the spec that log files come
 in directories named as the physical host, i.e.,
 meronense.torproject.org/metrics.torproject.org-access.log.20170707.log?
 >
 > Wait, there's no `.log` at the end of the file name. Example (from the
 server):
 >
 > `metrics.torproject.org-access.log-20170912.gz`
 >
 > Also note the `-` between `access.log` and the date.
 >
 > > 2. As already visible in 1.: the files are expected to have ending
 '.log' or '.log.bz2' or some other compression?
 > >
 > > Especially a clear answer for 2. is important for the implementation.
 >
 > I'd say the exact compression type is an implementation detail. See also
 the very last paragraph in the spec where we said: "Sanitized log files
 are typically compressed before publication. In particular the sorting
 step allows for highly efficient compression rates. We typically use XZ
 for compression, which is indicated by appending ".xz" to log file names,
 but this is subject to change." -- We could say something similar for logs
 that are provided to the sanitizer.
 >
 > How about we add a new first paragraph to Section 3.1 (Discarding non-
 matching files):
 >
 > """
 > Log files are made available to the santizer in a separate directory per
 physical web server host. Log files are typically gz-compressed, which is
 indicated by appending ".gz" to log file names, but this is subject to
 change. Overall, the sanitizer expects log files to use the following path
 format:
 >
 > <phyiscal-host>/<virtual-host>.torproject.org-access.log-YYYYMMDD[.gz]
 > """
 >
 > And while we're at it, let's change "''<hostname>''.torproject.org-
 access.log-YYYYMMDD" in the last paragraph of Section 2 to "''<virtual-
 host>''.torproject.org-access.log-YYYYMMDD".

 Sounds good.  Minor adaptions:

 Maybe, change "''<hostname>''.torproject.org-access.log-YYYYMMDD" in the
 last paragraph of Section 2 to "''<virtual-host>''-access.log-YYYYMMDD"
 (analogously <physical-host>)?  If not, should that be tested for and
 files not complying rejected, which I'd find too restrictive.

 And, an addition to 3.1:  files with unknown compression format are
 discarded.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23243#comment:36>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list