[tor-bugs] #22428 [Metrics/CollecTor]: Add webstats module

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Oct 23 16:46:19 UTC 2017


#22428: Add webstats module
-------------------------------+---------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_revision
 Priority:  High               |      Milestone:  CollecTor 1.5.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  metrics-2017       |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+---------------------------------

Comment (by iwakeh):

 Replying to [comment:36 karsten]:
 > Alright, I finished an initial review of commit 086e904 in your
 task-22428-4 branch. I have several trivial or minor findings, but I'd
 like to postpone them until we have resolved one that I consider major:

 Good, it's better to address the small stuff at the very end :-)

 >
 > I'm unclear whether the sibling approach is robust enough to cover all
 cases and edge cases. Maybe even worse, I'm unclear whether we'd notice if
 we'd be running into an uncovered edge case or if we'd silently not
 process and therefore lose data.

 This is not the siblings approach question, but the general question: When
 is the log for a certain day done?
 I'll address this in more detail on #23243, because this is a
 specification issue and this ticket here should only be concerned with
 implementation, imo.

 >
 > For example, what happens if we sanitize logs from a server that
 receives ''very'' few requests, maybe only a few requests per week?
 Consider these original log files (where I scrubbed the virtual host
 name):
 >  - `scrubbed.torproject.org-access.log-20171001.gz` contains requests
 from 2017-09-30 and 2017-10-01.
 >  - `scrubbed.torproject.org-access.log-20171002.gz` contains requests
 from 2017-10-01 only.
 >  - `scrubbed.torproject.org-access.log-20171004.gz` contains requests
 from 2017-10-03 only.
 >  - `scrubbed.torproject.org-access.log-20171006.gz` contains requests
 from 2017-10-05 and 2017-10-06.
 >
 > Would the existing code produce logs for 2017-10-01, -03, -05, and -06
 with exactly the sanitized log lines from these original log files? (I
 didn't run it, I only read the code and am unclear about this.)

 The result does not depend on the contents of an input log.  The above
 files would lead to a single sanitized log for 2017-10-01.  The
 implementation relies on having the sibling, which could be provided by a
 simple `touch scrubbed.torproject.org-access.log-20171003` command.  The
 application needs an outside cue.  I'm stating this here for completeness.
 As there is more to this (including the below questions), let's move the
 discussion to the spec ticket.

 >
 > Here's another, related question: what happens if a web server rotates
 logs more often than once per day? At least that's something that we write
 in the specification. I'm not sure how this would work with file names, so
 maybe we in fact require that logs are rotated exactly once per day, and
 we just didn't write that in the specification yet. However, it seems
 rather restrictive to prescribe exact log rotation intervals in order to
 sanitize logs subsequently. Maybe we should be less restrictive here.
 >
 > Is there a way to make this approach more robust? And is there a way to
 ensure that we'll learn about any broken assumptions as early as possible?

 Will move all these questions and possible answers to on #23243.

 >
 > Ah, and do you mind doing another round of JavaDoc editing and variable
 renaming towards finding a middle ground between 2-characters-is-almost-
 verbose and 80-characters-can-fit-in-a-line-so-let-us-not-use-more-
 than-79? As a fixup/squash commit without rebasing, please. :) Thank you!

 I'll take another look, no problem ;-)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22428#comment:37>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list