[metrics-bugs] #26035 [Metrics/Statistics]: Streamline sample quantile types used in the various modules

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed May 16 16:43:06 UTC 2018


#26035: Streamline sample quantile types used in the various modules
--------------------------------+---------------------------
 Reporter:  karsten             |          Owner:  iwakeh
     Type:  enhancement         |         Status:  accepted
 Priority:  High                |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:  Sponsor13
--------------------------------+---------------------------

Comment (by iwakeh):

 Replying to [comment:5 karsten]:

 Leaving out all commons-math related, b/c we agreed to use it.

 > Thanks, very useful! Let me first try to answer the open questions:
 >
 >  - What's up with a) and c) using slightly different percentile
 implementations? The reason is that we're including the 0th (minimum) and
 100th percentile (maximum) in a) which we're not in c). It's totally
 possible that what we're using right now for a) is a terrible hack. Maybe
 we should instead use the formula for c) in a) and handle percentile 0 or
 100 as a special case. Whatever the other implementations do.

 Well, c) would fail on `1.0`, but that wouldn't occur b/c only quartiles
 are computed. This ought to be fixed and both implementations will be the
 same except for the edge cases.

 >
 >  - What's up with e) and f) not being quartiles? What we're doing there
 is that we're computing the ''weighted'' quartiles. And again, it might be
 that it's a hack that we should rewrite. The goal should be to implement a
 weighted trimmed mean. The technical report probably has a better
 definition. What we cannot do, though, is use the exact same percentile
 definition as we're using for the other places.

 Well, I wouldn't call .25 times a value (fraction sum in the code) a
 quartile, and the code calculates the weighed mean of all intervals
 contained in `[sumFraction * 0.25, sumFraction * 0.75]`.  So, nothing to
 be done here.

 >
 > ...
 > I'm slightly leaning towards R-7 here.

 I don't feel strongly about this.

 >
 > ...
 > Except for Java where we'd have to implement something ourselves, which
 would also have to handle special cases 0 and 100.

 Yes the minimum and maximum need to be coded.

 >
 > ...
 > P.S.: Did I write something about trucks? I meant insect legs! Unless
 those have a spare leg mounted somewhere, too, in which case I'll think
 even harder about a good example. ;)

 Well, for insects the leg number is fixed to six, unless they loose a leg
 and live on later.  Might be best to stick to the values at hand ;-)

 So, I implement the changes decided in this and the previous comments.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26035#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list