[tor-bugs] #22428 [Metrics/CollecTor]: Add webstats module

Tor Bug Tracker & Wiki blackhole at torproject.org
Sat Jan 20 13:12:50 UTC 2018


#22428: Add webstats module
-------------------------------+---------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  assigned
 Priority:  High               |      Milestone:  CollecTor 1.5.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  metrics-2018       |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+---------------------------------

Comment (by iwakeh):

 Section 4.1 (spec) states:


 In addition, log lines are treated differently according to the date they
 contain:

     During an import process the sanitizer takes all log line dates into
 account and determines the reference interval as stretching from the
 oldest date to the youngest date encountered. Depending on the reference
 interval log lines are not yet processed, if their date is on the edges of
 the reference interval, i.e., the date is not at least a day younger than
 the older endpoint or the date is only LIMIT days older than the younger
 endpoint, where LIMIT is initially set to two, but this might change if
 necessary.
     If the younger endpoint of the reference interval coincides with the
 current system date, the day before is used as the new younger reference
 interval endpoint, which ensures that the sanitizer won't publish logs
 prematurely, i.e., before there is a chance that they are complete. Thus,
 processing of log lines carrying such date is postponed.
     All log lines with dates for which the sanitizer already published a
 log file are discarded in order to avoid altering published logs.

 While testing I noticed it might be useful to add a `WebstatsNoLimit'
 property defaulting to false and, if set to true, not applying the limits,
 i.e., writing all logs regardless.  This would make sense for a bulk
 import where it is known that the data are complete and ready to be
 published.
 (Of course, one 'workaround' is to add fake lines to enlarge the interval
 as necessary.)
 Thoughts?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22428#comment:44>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list