[metrics-bugs] #22983 [Metrics/metrics-lib]: add a descriptor interface and implementation for web-logs

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Jul 20 19:34:48 UTC 2017


#22983: add a descriptor interface and implementation for web-logs
---------------------------------+------------------------------
 Reporter:  iwakeh               |          Owner:  metrics-team
     Type:  enhancement          |         Status:  new
 Priority:  Medium               |      Milestone:
Component:  Metrics/metrics-lib  |        Version:
 Severity:  Normal               |     Resolution:
 Keywords:                       |  Actual Points:
Parent ID:                       |         Points:
 Reviewer:                       |        Sponsor:
---------------------------------+------------------------------

Comment (by karsten):

 Regarding the name, let's try to find something more descriptive. How
 about `WebServerLog` or even `ApacheHttpServerAccessLog`? Otherwise
 there's the risk of confusion with descriptor types added in the future,
 like a log file written by BridgeDB containing client requests for bridge
 addresses.

 Regarding the suggested interface, I think there's a short term and a long
 term part here.

 In the long term I think that it would be at least twice as useful if we
 read the log contents and added methods to read these parsed contents.
 It's true that this causes some development hassle. But that's why we do
 it once in the library rather than rely on possibly more than one
 application to get it right. And we can still include the raw descriptor
 bytes by storing the compressed bytes and inflate them upon request.

 Some comments on the interface:
  - Let's include a subtype `Request` or similar for each line contained in
 the log file, and let's include a method `getRequests()` that returns
 `Iterable<Request>`.
  - Due to the fact that we cannot include a `@type` annotation with a
 version number, `Request` should ideally include getters for all fields
 contained in Apache's Combined Log Format.
  - Ideally, `getLogDate()` would return the date in milliseconds since the
 epoch to be conformant to the rest of metrics-lib, in which case it would
 probably be called `getLogMillis()`.
  - I'm unclear what `getCompressionType()` returns. I think I'd expect a
 `String` that is either `"gz"` or `"gz"`, but not a `byte[]`. Was that
 intended?
  - If we read and parse logs, we'll have to change
 `getUnrecognizedLines()` to return any unrecognized lines.

 In the short term I can see how we might want to put the `Request` part on
 hold and only return metadata and uncompressed raw descriptor contents in
 this new descriptor type.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list