[tor-scaling] Nov 20 meeting recap: metrics analysis needed

Tue Jan 21 21:51:08 UTC 2020

Hi,

> On 22 Jan 2020, at 01:56, Karsten Loesing <karsten at torproject.org> wrote:
> 
> On 2020-01-14 03:23, Mike Perry wrote:
>>> On 12/19/19 4:11 AM, Karsten Loesing wrote:
>>> 
>>> On 2019-12-02 20:37, Mike Perry wrote:
>>>> In particular, I would like to look at 8 hour snapshots from 8/5 to
>>>> 8/15, broken out by relay flags, of CDF-TTFB and CDF-DL from
>>>> https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics
>>> 
>>> This is a very interesting experiment, and I'd like to help with the
>>> analysis, also in preparation of providing better tools for future
>>> analyses like this one.
>>> 
>>> However, I'm having trouble understanding what graphs you have in mind
>>> here. I made a very first graph, even though I think it's just the
>>> beginning of an interactive process (that should probably not happen on
>>> this mailing list but on a ticket).
>>> 
>>> https://people.torproject.org/~karsten/onionperf-cdf-ttfb-2019-12-19.pdf
>>> 
>>> Some thoughts:
>>> 
>>> - I'm not sure if CDFs are the best way to visualize the data here.
>>> CDFs work great for visualizing *one* distribution or for comparing a
>>> handful of distributions when plotting them as separate lines. But we
>>> have a few dozen distributions here, and we can't plot all these lines
>>> into a single coordinate system.
>> 
>> Well because this graph set is just one metric, these are all the same
>> distribution with the same coordinates. The main problem we appear to be
>> having is clipping/scale.
>> 
>> If we got the axis clipping to be reasonable, we could conceivably just
>> overlay everything on fewer graphs. Perhaps all of the graphs from the
>> same times of day could be combined as one overlayed pile of CDF lines,
>> and we could use colors to represent which CDF lines were for "ON" vs
>> "OFF" for the experiment dates. Ie: green CDF lines for Aug 9 - Aug 13
>> and red CDF lines for all other dates for this experiment.
>> 
>>> - Maybe we could try out different subsets of percentiles in a common
>>> line plot to see how the experiment affects TTFB in this case. Something
>>> like five-number summary or seven-number summary or whichever handful of
>>> percentiles we want to see.
>> 
>> Worst case, I am fine with having a page of graphs for each metric.
>> We're not bound by publication lengths when we do our own analysis.
>> However, if we can find a compact representation we like that has all of
>> this info, that will be helpful for academia. So it is worthwhile to
>> brainstorm.
>> 
>> If we lose resolution by taking a few percentile ranges, we may miss out
>> on some lumpiness that means something though. So I would like to
>> reserve that as last resort (or perhaps as a way to capture just the
>> long tail past 99%).
>> 
>>> - I didn't understand your idea to break out data by relay flags. These
>>> are requests over three-hop circuits. How would we split these up by
>>> relay flags?
>> 
>> Ah you are right. The flag separation only makes sense for the balancing
>> metrics. Sorry for the confusion.
>> 
>>> - This graph shows just the data from a single OnionPerf source. If we
>>> add multiple sources, it gets even more overloaded. But we can't really
>>> mix numbers from different sources, as they have their very own
>>> connection characteristics that would skew results.
>>> 
>>> Happy to make more graphs. It might help to see a sketch or longer
>>> description of what you expect to see. Thanks!
>> 
>> Hrm, I am not a data visualization expert, but what is most important
>> for us to understand is the nature of the variance of performance,
>> including the length of the long tail.
>> 
>> From your above plots, it looks like the experiment primarily negatively
>> impacted the long tail of perf, and maybe even 95-99% perf, but not
>> average perf. But I agree, even this much is hard to tell due to the
>> scale needed to display the full tail in CDF form. Perhaps this means a
>> clip at like 5-10 seconds for all graphs, to keep the X axis the same
>> length, and then some additional way to quantify the length and quantity
>> of the tail beyond the clip.
>> 
>> Basically, we want to be able to see if 0-99% CDF slope became wider or
>> got additional lumps, and we want to see if the 1% tail got longer or
>> shorter (and ideally also check if it has similar membership and data
>> points over time in terms of participant relays and time values, for
>> bug-hunting analysis).
>> 
>> We should definitely play around with a few different graphing methods
>> though, to compare various ways of capturing this info.
> 
> Alright, I made a set of new graphs, taking your comments above into
> account:
> 
> https://people.torproject.org/~karsten/onionperf-cdf-ttfb-2020-01-21.pdf
> 
> I included explanations and thoughts on these graphs in the graph
> captions. Not what captions were made for, but it seemed useful to have
> these ideas close to the graphs in this case.

In the caption on page 8, Karsten writes:
> And it reveals something interesting around the 50th percentile or
> 2 seconds mark. Whatever happens there shouldn't be happening.

Tor 0.2.9 does a lot of things on a 1-second timer, and I'm pretty sure
that some of them would delay circuit building. Since 0.2.9, we've moved
a lot of functions to "every event loop" (a few hundred milliseconds) or
"as needed". But 0.2.9 is only 10% of the network, so it shouldn't be
affecting 50% of the circuits, even if they are 3-hop circuits:
https://metrics.torproject.org/versions.html
(We should check consensus weight rather than relay count. But as far
as I remember, it's also around 10% for 0.2.9.)

Tor still does a few things on a 1 second timer, we should check if any of
those things delay circuit building. In particular, I think the
RelayBandwidthRate token buckets are still refilled every second.

T
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-scaling/attachments/20200122/d515f85e/attachment-0001.html>