[metrics-bugs] #26022 [Metrics/Statistics]: Fix a flaw in the noise-removing code in our onion service statistics

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed May 16 14:41:32 UTC 2018


#26022: Fix a flaw in the noise-removing code in our onion service statistics
--------------------------------+------------------------------
 Reporter:  karsten             |          Owner:  metrics-team
     Type:  defect              |         Status:  needs_review
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:
--------------------------------+------------------------------

Comment (by karsten):

 Replying to [comment:12 amj703]:
 > > We could sum up relay values first and then adjust the result.
 However, we'd lose the ability to discard outliers, which we're doing
 extensively with onion service statistics. After all, we're throwing out 2
 times 25% of reported values which we'd then include again.
 >
 > Why not throw out the outliers, then add the remaining, then do the
 adjustment?

 The way we're determining whether a reported value was an outlier or not
 is by extrapolating all reported values to network totals and discarding
 the lowest 25% and highest 25% of ''extrapolated'' values. But
 extrapolating values requires us to make these adjustment first, or we'd
 extrapolate to the wrong network totals.

 Here's another idea, though: what if we change the way how we're removing
 noise by ''only'' subtracting `bin_size / 2` to undo the binning step as
 good as we can and leave the Laplace noise alone. Basically, we'd only
 account for the fact that relays always round up to the next multiple of
 `bin_size`, but we wouldn't do anything about the positive or negative
 noise. Of course, we'd keep the remaining extrapolation step and outlier
 handling unchanged. Like this:

 {{{
   /** Removes noise from a reported stats value by subtracting half of the
 bin size. */
   private long removeNoise(long reportedNumber, long binSize) {
     return reportedNumber - binSize / 2;
   }
 }}}

 If this makes any sense, I could produce some numbers with this new, even
 simpler approach.

 > > Hang on. Relays always round ''up'' to the next multiple of
 `bin_size`. So, everything in `(-bin_size, 0]` will be reported as `0` and
 ''not'' as `-bin_size`.
 > >
 > > > I don’t think the “right side” rounding is happening with current
 use of the floor function, if it ever was. Maybe I’m wrong, but as I
 understand it Math.floorDiv((reportedNumber + binSize / 2) will round
 -0.75*binSize to -binSize.
 > >
 > > This part is correct. (The full "formula" is
 `Math.floorDiv((reportedNumber + binSize / 2), binSize) * binSize`.)
 >
 > These statements appear inconsistent. Is everything in (-bin_size, 0]
 rounded to 0, or is only [-bin_size/2,0] rounded to zero with [-bin_size,
 -bin_size/2) rounded to -bin_size? I think it's the latter, because
 Math.floorDiv((reportedNumber + binSize / 2), binSize) * binSize with
 reportedNumber=-0.75*binSize should evaluate to
 Math.floorDiv((-0.25*binSize), binSize) * binSize = -1 * binSize =
 -binSize. That appears consistent with how you've described Math.floorDiv
 and how the docs describe it at
 <https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html>: "Returns
 the largest (closest to positive infinity) int value that is less than or
 equal to the algebraic quotient".

 Wait, we're talking about two different things:

  1. Relays internally round ''up'' to the next multiple of `bin_size`.
  2. metrics-web contains that `removeNoise()` method that this ticket is
 all about.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022#comment:13>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list