[metrics-team] Discontinuity on bridge numbers

Karsten Loesing karsten at torproject.org
Mon Jul 30 19:33:41 UTC 2018


On 2018-07-30 21:03, David Fifield wrote:
> On Mon, Jul 30, 2018 at 08:34:52PM +0200, Karsten Loesing wrote:
>> On 2018-07-30 19:55, David Fifield wrote:
>>> On Mon, Jul 30, 2018 at 05:16:00PM +0000, nusenu wrote:
>>>> Vinicius Fortuna [vee-NEE-see.oos]:
>>>>> I noticed there's some discontinuity on the Bridge number data:
>>>>> https://metrics.torproject.org/bridges-ipv6.html
>>>>> https://metrics.torproject.org/networksize.html
>>>>>
>>>>> Do you know what happened there?
>>>>
>>>> the bridge authority got replaced, since this
>>>> wasn't a smooth transition it resulted in a lot of
>>>> lost bridges because they did not upgraded their tor version
>>>> which contains the new bridge authority to pubish to.
>>>
>>> I think the new bridge authority is not the whole story, because the
>>> bridge authority change happened on 2018-07-14, but the missing data
>>> begins on 2018-07-07 (and ends on 2018-07-21).
>>
>> Hmm? What makes you think the gap ends on 2018-07-21? In the two graphs
>> linked above the line comes back on 2018-07-13. Are you maybe looking at
>> a cached version of those graphs? It took us a few days to import the
>> new bridge descriptors into Tor Metrics, but AFAIK we did not miss any data.
> 
> I'm looking at
> https://metrics.torproject.org/userstats-bridge-country.html (https://archive.is/AQ9Mh)
> https://metrics.torproject.org/userstats-bridge-combined.html (https://archive.is/cQa3e)

Aha, indeed, those graphs have a larger gap than the two graphs linked
above.

Here's what happened: the estimation algorithm gets confused by the
sparse data it gets for the days between 2018-07-13 and 2018-07-20, so
that it removes those dates from the plot.

More specifically, it's calculating what fraction of reported statistics
it has available. If that fraction is between 10% and 100%, it's
displaying results. Here are the fractions for July 2018:

    date    | frac
------------+------
 2018-07-01 |   69
 2018-07-02 |   68
 2018-07-03 |   65
 2018-07-04 |   64
 2018-07-05 |   61
 2018-07-06 |   11
 2018-07-13 | 4294
 2018-07-14 |  446
 2018-07-15 |  162
 2018-07-16 |  189
 2018-07-17 |  172
 2018-07-18 |  137
 2018-07-19 |  126
 2018-07-20 |  100
 2018-07-21 |   78
 2018-07-22 |   78
 2018-07-23 |   81
 2018-07-24 |   81
 2018-07-25 |   81
 2018-07-26 |   82
 2018-07-27 |   76
 2018-07-28 |   73
 2018-07-29 |   51
 2018-07-30 |    2
(24 rows)

There's an actual gap starting on 2018-07-07. In fact, the algorithm
almost removed 2018-07-06, because 11% is hardly above 10%.

From 2018-07-13 on it's getting new data. But the bridge network
statuses contain very few bridges, while at the same time the bridge
descriptors contain statistics for all the time when the bridge
authority was unavailable. The effect is that the code thinks that it's
seeing 4294% (446%, 162%, etc.) of reported statistics, which it simply
discards.

Only on 2018-08-21 the computed fraction goes below 100% (on 2018-08-20
it was slightly over 100%, it seems), so that statistics are displayed
again.

Anyway, I'd still give it another week or two until things are stable
enough. This code was not written for extreme situations like this.

By the way, if somebody is interested in even more details, we now have
a specification online for reproducing the numbers on Tor Metrics. The
algorithm above is specified here:

https://metrics.torproject.org/reproducible-metrics.html#bridge-users

Work in progress, handle with care, patches welcome.

All the best,
Karsten

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 528 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/metrics-team/attachments/20180730/730cdabb/attachment.sig>


More information about the metrics-team mailing list