[tor-bugs] #26868 [Metrics/Statistics]: How does metrics get bridge statistics at a granularity of 1 user?

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Jul 24 07:55:51 UTC 2018


#26868: How does metrics get bridge statistics at a granularity of 1 user?
--------------------------------+------------------------------
 Reporter:  teor                |          Owner:  metrics-team
     Type:  defect              |         Status:  new
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:
--------------------------------+------------------------------

Comment (by karsten):

 Replying to [comment:7 teor]:
 > So, I believe he answer to my question is:
 >
 > "We approximate directory request numbers by multiplying the fraction of
 unique IP addresses from a given country, transport, or IP version with
 the total number of successful requests."

 That would produce smaller numbers than 8, too.

 Another answer is this part: "Split observations to the covered UTC dates
 by assuming a linear distribution of observations."

 We'd have to look at the raw data to say which one is the better answer.
 But I assume your question is mostly answered by knowing that it's not a
 too small number in the original data.

 > But I think there are two missing steps:
 > * Metrics appears to round/truncate/ceiling client numbers to the
 nearest integer

 Right, we're using integer truncation here. We should probably document
 that under Step 4 of the  [https://metrics.torproject.org/reproducible-
 metrics.html#relay-users Relay users] section.

 > * You say that you "Skip dates where frac is smaller than 10% and hence
 too low for a robust estimate"
 >   * are the snowflake bridges less than 10% of total bridge usage? That
 could be why their numbers vary so much.
 >   * how do you calculate 10% of bridge usage? (Bridges don't have
 bandwidth, so do you use unique IP addresses?)

 Wait, no, ''frac'' is the "estimated fraction of reported directory-
 request statistics". It is unrelated to snowflake in particular and refers
 to all bridge usage. The formula for computing ''frac'' is specified in
 Step 3 of the [https://metrics.torproject.org/reproducible-metrics.html
 #relay-users Relay users] section.

 Please let me know if this makes more sense now, and if not, how we can
 improve it. Thanks!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26868#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list