[metrics-bugs] #22983 [Metrics/metrics-lib]: add a descriptor interface and implementation for web-logs

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Jul 26 15:38:09 UTC 2017


#22983: add a descriptor interface and implementation for web-logs
---------------------------------+------------------------------
 Reporter:  iwakeh               |          Owner:  metrics-team
     Type:  enhancement          |         Status:  needs_review
 Priority:  Medium               |      Milestone:
Component:  Metrics/metrics-lib  |        Version:
 Severity:  Normal               |     Resolution:
 Keywords:                       |  Actual Points:
Parent ID:                       |         Points:
 Reviewer:                       |        Sponsor:
---------------------------------+------------------------------
Changes (by iwakeh):

 * status:  new => needs_review


Comment:

 Please review [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/log/?h=task-22983 this branch].

 The [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/commit/?h=task-22983&id=20a9f82d06adbf960f1da8ff9853e50c5c1c5e25
 first commit] adds the new interfaces and their implementations.

 LogDescriptor contains all methods that will be hopefully applicable for
 all log-types possible.
 WebServerAccessLog is the specialization for access-logs.

 LogDescriptor also offers a sub-interface:
 {{{
    /**
    * Providing a single function for removing sensitive data from a
    * given Apache Access Log log line.
    */
   public interface Sanitizer {

     /** Returns a cleaned log line, i.e., without possibly privacy
      * sensitive values. */
     public String clean(String line);
   }
 }}}

 and a method `sanitize()`.  The latter applies the cleaning procedure to
 all log lines and sorts the resulting lines.  The default sanitizer
 returns the line w/o any changes.  This setup keeps all descriptor
 parsing, compression, un-compression in metrics-lib; CollecTor is not
 forced to re-implement parsing functionality and only needs to provide the
 log cleaning procedure.  (A similar approach could be thought up for
 bridge-sanitation, too.)

 The [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/commit/?h=task-22983&id=d4ece5649573f315a8c63f43e490c3594f35affd
 second commit] makes `DescriptorParser` aware of the new types and avoids
 implementation javadoc comment generation for the new package.

 All of the code is covered by tests which are added in
 [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/commit/?h=task-22983&id=e07bca5e9429b2b93bb2cd3c0ef6911ad42ec32e
 this commit].  Total coverage even improved by one percent :-)

 The addition of another sub-interface `LogDescriptor.LogLine` (and the
 extensions to WebServerAccessLogLine) will be part of a new ticket, which
 will also provide unrecognized lines for access-logs.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list