[tor-bugs] #22428 [Metrics/CollecTor]: Add webstats module

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Jan 25 08:55:39 UTC 2018


#22428: Add webstats module
-------------------------------+---------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_revision
 Priority:  High               |      Milestone:  CollecTor 1.5.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  metrics-2018       |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+---------------------------------

Comment (by karsten):

 Alright, I made a few tweaks to your branch
 [https://gitweb.torproject.org/karsten/metrics-db.git/log/?h=task-22428-4
 in my task-22428-4 branch] with the latest commit being 1f1adec.

 I also ran a first round of tests. Here's what I found:
  - The `recent/` directory contains only 1 log file per physical/virtual
 host from 3 days ago. The reason is that the last 2 days are excluded as
 too recent. But the `recent/` directory is supposed to contain the last 72
 hours of data produced by CollecTor, not data originally published within
 the last 72 hours. The goal is to give CollecTor clients 72 hours to
 obtain newly provided data before they need to fall back to archives. I
 think the other modules only look at last-modified time of provided files
 and delete them after 72 hours. Maybe we should do that here, too.
  - The regular expression in metrics-lib's `WebServerAccessLogLine` is too
 strict. It does not consider the following log line to be valid, but it
 should: `0.0.0.0 - - [22/Jan/2018:00:00:00 +0000] "GET /collector/archive
 HTTP/1.1" 301 -`. We should probably also compare the regular expression
 to the Apache specification for similar cases where we're being too
 strict.
  - There are subtle differences between files provided by webstats.tp.o
 and the ones produced here. We should probably write a simple (shell)
 script to convert files from webstats.tp.o to the format we'd have
 produced. Things like cutting off columns at the end or cutting off the
 `?` from parameters. We'd run that script once when copying over files.
  - I guess we'll need to extend create-tarballs script to produce tarballs
 containing webstats files. I haven't looked, though.

 Do you want to look into these issues?

 Remaining changes after those above are:
  - Run another round of tests with most or even all original log files as
 input.
  - Clean up `CHANGELOG.md`.
  - Release metrics-lib 2.2.0 and update `build.xml`.
  - Update copyrights to 2018.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22428#comment:50>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list