[tor-bugs] #23243 [Metrics/Website]: Write a specification for Tor web server logs

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Oct 24 14:09:18 UTC 2017


#23243: Write a specification for Tor web server logs
-----------------------------+--------------------------------
 Reporter:  iwakeh           |          Owner:  metrics-team
     Type:  enhancement      |         Status:  needs_revision
 Priority:  Medium           |      Milestone:
Component:  Metrics/Website  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:  metrics-2017     |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+--------------------------------

Comment (by karsten):

 Replying to [comment:47 iwakeh]:
 > Replying to [comment:46 karsten]:
 > > I'm not sure if we can resolve these questions by hard thinking.
 >
 > Well, we need to work on thoughtful decision making.

 Unfortunately, I'm not a good person these days to dive deep enough into
 this topic to make thoughtful decisions. Too many topics, too little time
 to do any of them well enough. That's why I hoped you'd just solve all
 problems here and I could then review the solution. :)

 > There're not that many questions above except yours:
 > > ... what happens if a web server rotates logs more often than once per
 day? At least that's something that we write in the specification. I'm not
 sure how this would work with file names, so maybe we in fact require that
 logs are rotated exactly once per day, and we just didn't write that in
 the specification yet. However, it seems rather restrictive to prescribe
 exact log rotation intervals in order to sanitize logs subsequently. Maybe
 we should be less restrictive here.
 >
 > The current webstat code and the spec require a log per day.

 Well, no. The spec says "Tor's web servers are configured to rotate logs
 ''at least'' once per day". If we didn't mean that, let's phrase it
 differently. But how?

 And we should write down possible failure modes for the case that logs are
 rotated less often or more often.

 In any case, we should warn in case we run into one of these cases, rather
 than silently continuing operation and simply producing fewer/smaller
 sanitized logs.

 > [...]
 > 1. Make sure by outside means that there is no day without a log (e.g.
 by providing an empty file for that day using 'touch').  This would work
 without additional implementation for CollecTor and this works for bulk
 imports as well as daily processing.  As a result there will be a
 sanitized log for each day offered by CollecTor, some might be empty.

 I'd say we need to do something that doesn't require any upstream changes.
 In other words, whatever ends up in `in/webstats/` is what we should be
 able to work with. We shouldn't require upstream to touch files for us.

 > 2. For bulk processing a property could signal CollecTor to use all logs
 without insisting on an uninterrupted chain.  This still requires outside
 measures for making sure no log lines are lost and might result in days
 without any logs, unless CollecTor creates empty ones.
 > 3. Think out a mechanism that enables more automated processing of an
 interrupted chain of logs.  This seems error prone an will result in many
 edge cases.

 I don't know, maybe we can do something with system time or state files.
 Or we could process everything in `in/webstats/` and write everything to
 `out/` and `recent/` except the first and last encountered UTC days. Just
 some ideas.

 Again, I'm not deep enough into this to make a good decision. I just hope
 that whatever thing we'll build here is robust enough to either handle all
 of the cases or warns loudly whenever it runs into an unforeseen case.

 I'm very concerned about silently losing data. That's the worst thing that
 could happen to us, in particular given that we don't keep archives of the
 input data in this case.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23243#comment:48>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list