[tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Jan 10 15:10:58 UTC 2018


#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-----------------------------+-----------------------------------
 Reporter:  iwakeh           |          Owner:  metrics-team
     Type:  enhancement      |         Status:  needs_review
 Priority:  High             |      Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:  metrics-2018     |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+-----------------------------------

Comment (by karsten):

 Replying to [comment:57 iwakeh]:
 > Replying to [comment:56 karsten]:
 > > I started looking at your branch, but it's a pretty big diff again, so
 that'll take some time.
 >
 > You only need to look at the latest four commits.  The others are
 reviewed already (comment:47).

 Oh. That information would have saved some time this morning. That also
 explains why I have just one remark on the earlier commits and a couple
 more on the first four!

 > > Regarding Git history, we're going to squash all these commits (except
 for unrelated ones like adding a space or package-info) into a single
 commit that adds interfaces, implementations, and tests, right? I'm
 asking, because you marked some commits as "fixup!", but not all of them.
 Or do you have a separation in mind? If so, which?
 > >
 >
 > Maybe, keep those that you find easier to understand the changes.  I
 used 'fixup' for making clear that a change is related to an earlier
 review discussion.

 There are two types of changes here:

  1. Changes in the review process that require knowing the branch under
 review. Ideally, we'd use "fixup" or "squash" commits for all commits made
 during the review process. Alternatively, we can agree that commits will
 be squashed prior to merge, which works for me, too.
  2. Changes in the merge process that only require knowing master before
 the merge. Somebody who didn't follow the review process, including our
 future selfs, should see changes where we're not going forth and back
 throughout commit series to achieve something that we could do in one
 step.

 > > By the way, what was the reason for rebasing your branch? It would
 have been a tad bit easier to stay on the same branch until we're done
 with the review process. Just saying for future reviews.
 >
 > You mentioned that above already ;-)
 > (I think, last year I did the rebase because it seemed better to work on
 a branch close to master.)

 Maybe I ran into that rebase today when fetching from your repository and
 seeing this branch to be force-updated. But it's good to know I mentioned
 this before. With a few more samples you'll soon be able to compute my
 attention span. :)

 Anyway, here's what I found in your branch:

  - `AccessLogLineSanVal` contains an explicit mention of `"FTP"`. Does
 this mean we're considering FTP log lines to be valid? Why FTP in
 particular and no other protocols? Or should we take that out and restrict
 ourselves to HTTP? Not feeling strongly about this one.
  - Further down below we're comparing `referenceDate` to `extractedDate`
 and discarding lines where the two don't match. Not sure if this is
 problematic, but I'm at least flagging it here as potential issue. If this
 is a non-issue, feel free to ignore this comment.
  - A few lines further down we're formatting a date using
 `DateTimeFormatter.ofPattern()`. Is that expensive? Would we be able to
 create the formatter just once and re-use it here? Not trying to optimize
 prematurely, but trying to avoid shooting ourselves in the foot.
  - Logging an error in case of catching a `Throwable` might become very
 noisy. After all, we're processing third-party input here, and we can
 typically proceed without parsing that line. The debug level seemed like
 the better choice here.
  - In `WebServerAccessLog`, what exactly is the log date? Is it the date
 of contained requests, is it the date when the file was written to disk,
 or what is it? This is still unclear in the docs.

 When these remaining issues are clearer to me, do you mind me editing your
 branch and preparing one or more "fixup"/"squash" commits? There are still
 a few things that I'd like to document differently or where I'd like to
 make the new code more similar to existing code (in the good sense). You
 could still go through them and talk me out of making those changes. Sound
 okay?

 Oh, and here's another remark for future reviews: let's try to reduce the
 time for reviewing and for revising a branch to a few days. Neither
 reviewing nor revising is fun, but it's even less fun when doing it
 several weeks later. I know this is a special case with the holiday break,
 and I'm not blaming you any more than I'm blaming myself. Just an idea to
 make reviews a little more fun. :)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:58>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list