[tor-dev] tor's definition of 'median'
Maciej Soltysiak
maciej at soltysiak.com
Tue Aug 11 14:34:11 UTC 2015
Virgil's absolutely right. Median as the "middle" value in a _sorted_ set
is:
- for odd number of data points, it's the middle one: set[N/2]
- for even number of data points, it's the average of two in the middle:
(set[N/2] + set[(N+1)/2]) / 2
On Tue, Aug 11, 2015 at 3:44 PM, Virgil Griffith <i at virgil.gr> wrote:
> I mean the median.
>
> From Wikipedia...
>
> For example, if *a* < *b* < *c*, then the median of the list {*a*, *b*,
> *c*} is *b*, and, if *a* < *b* < *c* < *d*, then the median of the list {
> *a*, *b*, *c*, *d*} is the mean of *b* and *c*; i.e., it is (*b* + *c*) /
> 2.
>
> -V
>
> On Tue, Aug 11, 2015 at 9:29 PM John <oneofthem at riseup.net> wrote:
>
>> I think you are confusing the median with the mean:
>>
>> https://en.wikipedia.org/wiki/Median
>> https://en.wikipedia.org/wiki/Mean
>>
>> Taking the median instead of the mean can be beneficial in situations
>> where you have larger outliers in your data, which typically affect the
>> mean very much.
>>
>> -j
>>
>> Virgil Griffith:
>> > Is there some implementation-specific reason not to use the standard
>> > mathematical definition of "median"? If not, I propose changing the
>> > implementation to become it.
>> >
>> > -V
>> >
>> > On Tue, Aug 11, 2015 at 2:44 AM Nick Mathewson <nickm at alum.mit.edu>
>> wrote:
>> >
>> >> On Mon, Aug 10, 2015 at 1:11 PM, nusenu <nusenu at openmailbox.org>
>> wrote:
>> >>> -----BEGIN PGP SIGNED MESSAGE-----
>> >>> Hash: SHA512
>> >>>
>> >>> Hi,
>> >>>
>> >>> https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028
>> >>>
>> >>>> If 3 or more authorities provide a Measured= keyword for a router,
>> >>>> the authorities produce a consensus containing a "w" Bandwidth=
>> >>>> keyword equal to the median of the Measured= votes.
>> >>>
>> >>> a random sample from recent votes:
>> >>>
>> >>> grep 37.59.38.117 -A 3 *|grep Measured
>> >>> w Bandwidth=6869 Measured=7570
>> >>> w Bandwidth=6869 Measured=15500
>> >>> w Bandwidth=6869 Measured=18100
>> >>> w Bandwidth=6869 Measured=30500
>> >>>
>> >>> Tor says the median value is
>> >>> 15500
>> >>>
>> >>> 2015-08-10-16-00-00-consensus:
>> >>> w Bandwidth=15500
>> >>>
>> >>> but the median of these 4 values is actually:
>> >>> (18100+15500)/2 = 16800
>> >>> no?
>> >>>
>> >>> Has tor a different definition of 'median' and simply takes always the
>> >>> second ordered measurement vote out of 4 votes or is there a bug in
>> >>> the spec or implementation?
>> >>
>> >> There's one misplaced throwaway sentence in dir-spec.txt:
>> >>
>> >> " All ties in computing medians are broken in favor of the smaller or
>> >> earlier item.
>> >> "
>> >>
>> >> We should bring this, and probably other things, into a "definitions"
>> >> section earlier in dir-spec.txt. Patches welcome. ;)
>> >>
>> >> --
>> >> Nick
>> >> _______________________________________________
>> >> tor-dev mailing list
>> >> tor-dev at lists.torproject.org
>> >> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>> >>
>> >
>> >
>> >
>> >
>>
>
>
>
