[metrics-bugs] #33076 [Metrics/Analysis]: Graph onionperf and consensus information from Rob's experiments

Thu Feb 13 12:34:08 UTC 2020

#33076: Graph onionperf and consensus information from Rob's experiments
-------------------------------------------------+-------------------------
 Reporter:  mikeperry                            |          Owner:
                                                 |  metrics-team
     Type:  task                                 |         Status:
                                                 |  needs_review
 Priority:  Medium                               |      Milestone:
Component:  Metrics/Analysis                     |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  metrics-team-roadmap-2020Q1, sbws-   |  Actual Points:  3
  roadmap                                        |
Parent ID:  #33121                               |         Points:  6
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by dennis.jackson):

 Replying to [comment:24 karsten]:

 == 24 Hour Moving Average
 > I like your percentiles graph with the moving 24 hour window. We should
 include that graph type in our candidate list for graphs to be added to
 OnionPerf's visualization mode. Is that moving 24 hour window a standard
 visualization, or did you further process the data I gave you?

 At a high level: I'm loading the data into Pandas and then using the
 `rolling` function to compute statistics for a window. It's pretty
 flexible supports different weighting strategies for the window, but I
 used 'uniform' here. The code is contained in the python notebook I linked
 at the end of my post.

 Excerpt:
 {{{
 time_period = 60*60*24
 threshold = 10
 p95 = lambda x :
 x.rolling(f'{time_period}s',min_periods=threshold).dl.quantile(0.95)
 }}}
 The resulting data can be plotted as a time series in your graphing
 library of choice :).

 == Measuring Latency

 > Regarding the dataset behind bandwidth measurements, I wonder if we
 should kill the 50 KiB downloads in deployed OnionPerfs and only keep the
 1 MiB and 5 MiB downloads. If we later think that we need time-to-50KiB,
 we can always obtain that from the tgen logs. The main change would be
 that OnionPerfs consume more bandwidth and also put more load on the Tor
 network. The effect for graphs like these would be that we'd have 5 times
 as many measurements.

 I think that is definitely worth thinking about as 50 KB does seem too
 small to infer anything about bandwidth. It is maybe worth considering the
 cost of circuit construction though. For example, if we open a circuit for
 latency measurement, we could use Arthur's strategy of fetching HEAD only
 and maybe it is worth using that circuit for a series of measurements over
 a couple of minutes which would give us more reliable "point in time" data
 without any additional circuit construction overhead.

 == August Measurement Success Rate

 > But I think (and hope) that you're wrong about measurements not having
 finished. If DATAPERC100 is non-null that actually means that the
 measurement reached the point where it received 100% of expected bytes.
 See also the [https://metrics.torproject.org/collector.html#type-torperf
 Torperf and OnionPerf Measurement Results data format description].

 You are quite right! I looked back at my code and whilst I was correctly
 checking DATAPERC100 is non-null to imply success, I also found a trailing
 `}` which captured my check in the wrong `if` clause. My bad! Rerunning
 with the fix shows only 29 measurements failed to finish in August. Much
 much healthier!

 == Number of Measurements in August
 > Are you sure about that 10k ttfb measurements number for the month of
 August? In theory, every OnionPerf instance should make a new measurement
 every 5 minutes. That's 12*24*31 = 8928 measurements per instance in
 August, or 8928*4 = 35712 measurements performed by all four instances in
 August. So, okay, not quite 10k, but also not that many more. We should
 spin up more OnionPerf instances as soon as it has become easier to
 operate them.

 Sorry, this was sloppy and incorrect wording on my part: "month of August"
 -> "Experimental period from August 4th - August 19th". There are 15k
 attempted measurements in this window, however op-hk did not achieve any
 successful connections and consequently only ~10k successful measurements
 in the dataset.

 == How many is enough?

 > What's a good number to keep running continuously, in your opinion? 10?
 20? And maybe we should consider deploying more than 1 instance per host
 or data center, so that we have more measurements with comparable network
 properties.

 I think it would be worth pulling Mike (congestion related) and the
 network health team (#33178) in and thinking about this in terms of output
 statistics rather than measurements input. Possible Example:

  * For a given X `{minute,hour,day}` period, we want to measure for `{any
 circuit, circuits using this guard, circuits using this exit}`,
 `{probability of time out, p5-p50-p95 latency, p5-p50-95 bandwidth}` with
 a 90% confidence interval less than `{1%, 500ms, 500 KB/s}`

 This gives us a rolling target in terms of measurements we want to make,
 varying on network conditions and how fine grained we would like the
 statistics to be for a given time period. We could estimate the number of
 samples required (using the existing datasets) for each of these
 statistics, put in the cost per measurement and work out what is feasible
 for long term monitoring and short term experiments.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33076#comment:25>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online