[metrics-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Nov 21 14:47:17 UTC 2017


#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-----------------------------+-----------------------------------
 Reporter:  iwakeh           |          Owner:  metrics-team
     Type:  enhancement      |         Status:  needs_revision
 Priority:  Medium           |      Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:  metrics-2017     |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+-----------------------------------

Comment (by iwakeh):

 Thanks for hanging in and working through all!

 I try to shorten my reply and simply leave out all points of agreement
 withou any further comment.  Feel free to raise them again if necessary.

 Replying to [comment:47 karsten]:
 > ===== Variable names

 I copied the comments about variable names to ticket #24370, which is for
 defining some naming guidelines all over metrics.  Let's continue the
 discussion there.

 > ===== Validation vs. sanitization
 > I'm still confused what validation means in this context.

 Metrics-lib provides web log parsing of sanitized logs as available on
 CollecTor.  When parsing such a log file lines need to be validated, i.e.,
 metrics-lib verifies that these are standard sanitized log lines.
 Metrics-lib does not supply a general log parser.

 Sanitization is only supplied as internal metrics-lib feature and used by
 CollecTor to sanitize the logs to be published.

 > Is a line containing a POST request a valid line, or one that uses FTP
 as protocol or that returns HTTP 404 as status code. It's okay that
 CollecTor skips these lines as part of the sanitizing process. But that
 doesn't make them invalid.

 From the point of a consumer that expects sanitized log lines the POST and
 FTP lines are invalid.

 >
 > I'm also a bit uncleear if the separate validate and sanitize steps have
 a negative impact on performance. In theory, it should be sufficient to
 touch each line once. But I could be convinced that we're trading
 performance for better design, if this is the case.

 Touching the lines once is only a concern during sanitization.  The
 current code only operates once on each line.  Simply parsing santized
 logs only needs the validation part anyway.

 >
 > However, I'd really want us to be clear what it means for a line to be
 valid!

 Yes!

 > ...
 > One thing we should do is document whether `private byte[] logBytes`
 might be compressed or not. We have been discussing that many times now,
 and I'm deeply confused already what we're doing there.

 Already addressed in the respective javadoc comments.  All decompressed.

 > ===== 629ef152be1fd2f5a00d203b614fc01e946c518d Tweak pattern for logline
 validation.
 > One question about the pattern: Does the `?:` in
 `"^((?:\\d{1,3}\\.){3}\\d{1,3}) ..."` mean that we're now ignoring the
 first 3 octets regardless of whether they're `0.0.0.` or not? That would
 not follow the specification ...

 I will re-check that with the new spec we have from #23243.

 > ===== e24dda11613f340da3fbd6f1a93ba07d857f0b16 Remove getLogDateMillis
 and move getLogDate to WebServerAccessLog.
 >
 > I wonder why there's no `getLogDateMillis()` anymore. But I can be
 convinced that users should just extract millisecond from `LocalDate` if
 they need them. Is that the plan? If so, green light.

 Yep.

 > ==== 6a6e93a2fd8b3f3b912ce9a258f4b32069c18ef8 (iwakeh/task-22983-4) Set
 dev version.
 > Let's revert this commit. ...

 I reverted this, but instead would like to add the '2.1.1-dev' commit.
 (Mainly for CollecTor #22428 implementation and testing.)

 Next steps:

 The latest changes in #23243 also makes a revision necessary.  I'll use
 the current branch here to add the changes.
 So far the only open topic is your question about validation&sanitization.
 The above answer might have resolved it a bit.  Maybe, it'll also be more
 clear in the upcoming commit(s) for the spec changes.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:48>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list