[metrics-bugs] #32126 [Metrics/Ideas]: Add OONI's Vanilla Tor measurement data to Tor Metrics
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Oct 17 12:26:52 UTC 2019
#32126: Add OONI's Vanilla Tor measurement data to Tor Metrics
Reporter: karsten | Owner: metrics-team
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics/Ideas | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
Comment (by karsten):
Here are the results from my analysis in the past few days:
This first graph shows all measured times until 100% bootstrapped between
2016-01 and 2019-10. Some observations:
- 50% of measurements were done in under 15 seconds, and roughly 90%
finished in under 1 minute.
- There's a bump shortly after 120 seconds, which is most likely the
result of a 120 second timeout somewhere in the process.
- A few percent of measurements did not succeed within the test timeout
of 300 seconds: the line is not at 100% at the 300 seconds mark but
roughly at 97%.
The second graph shows different stages of the bootstrap process. Again
- It's not entirely clear (to me) why 0% bootstrapped is not just a
vertical line at the 0 s mark. If it requires work to get to 0%, it's not
0% but rather 2%, 1%, or 0.5% of the process. Maybe a naming issue,
possibly a measurement issue. At least all measurements succeed at
bootstrapping to 0% within the test time.
- The 20% line has a small bump right after 120 s, so there must be a 120
s timeout for this early bootstrap phase. There's another bump at roughly
130 s which could be due to the same 120 s timeout that was started later.
- The 80% and 100% line are almost the same. If a client makes it to 80%,
it's just a matter of seconds to get to 100%.
The third graph shows the same data broken down by country for the slowest
5 countries. Observations:
- Most measurements in China and Egypt did not proceed past the 0%
- Almost none of the Kazakhstan succeeded, even fewer than in China and
Egypt. The 20% bootstrapped line looks really funny, starting to increase
only after full 2 minutes. Maybe these measurements would succeed after 10
or 20 minutes, which is something we won't find out from this data.
- Belarus has two visible bumps shortly after 2 and 4 minutes. I would
guess that there'd be more bumps after 6 and 8 and 10 minutes. Maybe this
is related to some subset of relays not being reachable.
- Turkey has roughly 1/4 of measurements not succeeding, with the
remaining ones looking slow-but-okay. The reason might be that we're
looking at almost 3 years of measurements here, and maybe bootstrapping
succeeded in 75% of the time and did not succeed in 25% of the time.
The next step here is to discuss '''what''' results we want to add to Tor
Metrics. Are these graphs useful, or is there something potentially more
interesting in the data that we want to have? I'm hoping for input from
other teams here.
All graphs above are ECDFs, unlike other graphs on Tor Metrics. This is a
smaller issue on the graphing side, because we need to process non-
aggregated measurements for making a graph. It's also a possible issue on
the usability side, because ECDFs are probably harder to understand than
The next step after answering the questions above is to figure out
'''how''' we'd get the data for these new graphs. Some thoughts:
- Maintaining our own copy of the OONI metadata database, like I did for
this analysis, isn't feasible. We only need a small fraction of ~40G of
this database which currently has a total size of 696G. Also, cloning this
database took way too long for us to do it once per day.
- We might be able to maintain a copy of the .yaml files of vanilla_tor
measurements only. We would sync these once or twice per day and serve
them with CollecTor. We'd have to define our own database schema for
importing and aggregating them. This is not a small project and not a
- A while ago we were hoping to get a .csv file from OONI with just the
data we need. For example, the .csv file behind the three graphs above is
150M large, though it could easily be reduced to 75M, uncompressed. Maybe
we'd have to define precisely what data we want (the discussion above) and
then write the database query for it. This would be the smallest project
and commitment from our side; in other words, it would be most likely to
- A possible variant of the ideas above would be that we operate on a
read-only copy of the metadata database where we can define views, run
queries, and export results as .csv files.
It would be great to hear from OONI folks which of these approaches would
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32126#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs