[metrics-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Nov 22 11:08:46 UTC 2017


#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-----------------------------+-----------------------------------
 Reporter:  iwakeh           |          Owner:  metrics-team
     Type:  enhancement      |         Status:  needs_revision
 Priority:  Medium           |      Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:  metrics-2017     |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+-----------------------------------

Comment (by karsten):

 Agreed with all points above except one:

 When ''parsing'' sanitized log lines metrics-lib should not reject log
 lines that it would discard when ''sanitizing'' original log lines.

 It's not the job of the ''parser'' to ensure that its input is properly
 sanitized or to do some sort of post-sanitizing. Of course it needs to
 perform some basic format verifications to perform its job. But dropping
 lines because the sanitizer would drop them seems out of place.

 Imagine a hypothetical situation where we decide at some point that HEAD
 requests are too sensitive and we take them out in the parser. However,
 previously sanitized logs would still contain them, including archives
 that people keep locally and that we can't update. If somebody then takes
 a recent metrics-lib version to parse their data, they'd suddenly don't
 get the HEAD lines anymore. That would be rather confusing.

 I think sanitizing and parsing should be separate things. In this case,
 discarding lines because of certain field contents should be left to the
 sanitizer.

 Does that mean we should provide a general-purpose log parser? Probably
 not. In the parser we don't have to provide getters for fields that we
 don't care about, like user-agent string. But we should be prepared to
 find request methods GET, HEAD, POST, or really anything else in log lines
 we're given.

 Does that make sense, or am I overlooking something?

 (By the way, it's a good thing that we're keeping the spec unchanged with
 regard to IP addresses not starting with `0.0.0.`. I think it would have
 been pretty bad to just rewrite the first three octets to `0.0.0` and keep
 the fourth unchanged. Not very privacy-preserving.)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:50>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list