[tor-dev] 24 hours worth of BridgeDB usage metrics

Philipp Winter phw at nymity.ch
Tue Jul 30 18:13:05 UTC 2019


On Tue, Jul 30, 2019 at 05:42:11PM +0200, Karsten Loesing wrote:
> You say that you're planning to add aggregate statistics like numbers by
> distributor without drilling down to transports or countries. Keep in
> mind that this is going to reduce the noise that you added when rounding
> up to multiples of 10. For example, knowing that the total by country is
> closer to $entries_in_that_country * 1 or $entries_in_that_country * 10
> will tell you something about the average noise added per entry. It
> would be more privacy-preserving (and also less accurate) to keep all
> the noise in the statistics and do the aggregation in a separate step.

That's a great point.  I was originally concerned about the decrease in
accuracy but, after running the numbers, it seems tolerable.  Let's have
a look at the lower and upper bound of the total number of HTTPS
requests.  Summing up all bins (and ignoring bot requests) gives us the
upper bound:

  grep https bridgedb-metrics.log | grep -v zz | cut -d ' ' -f 3 | paste -sd+ | bc
  3850

To determine the lower bound, we first calculate the number of bins:

  grep https bridgedb-metrics.log | grep -c -v zz
  235

Then, we multiply the number of bins by 9 and subtract it from the upper
bound, which gives us a lower bound of 1,735.

Applying this method to all three distribution mechanisms results in the
following table:

        Lower bound  Upper bound
        -----------  -----------
  Moat        4,576        4,630
  HTTPS       1,735        3,850
  Email         303          420

Despite the inaccuracy caused by the binning, we can be certain that
moat is more popular than HTTPS (moat's lower bound > HTTPS's upper
bound) and email is an order of magnitude less popular than both HTTPS
and moat.  HTTPS is the most inaccurate because of the large number of
bins.

> What is obs4 in bridgedb-metric-count email.obs4.gmail.fail.none 10 (as
> opposed to obfs4)?

That's a typo that a user made when requesting the transport.  I had not
yet changed the code to only consider transports that are supported by
BridgeDB.  All unsupported transport types should result in a log
message and not affect the metrics.

Interestingly, there's another metrics line that shows that there were
1-10 successful requests for the invalid obs4 transport.  When
requesting an invalid transport, BridgeDB tells you that there are
currently no bridges available.  Instead, it should tell you that the
requested transport does not exist.

> Would it make sense to add a line like bridge-stats-version to include a
> version number of some sort, just in case you want to change the format
> at a later time?

Yes, that's a good idea.  I will do that.

Thanks,
Philipp


More information about the tor-dev mailing list