[tor-scaling] Nov 20 meeting recap: metrics analysis needed

Mike Perry mikeperry at torproject.org
Tue Jan 14 02:23:38 UTC 2020

On 12/19/19 4:11 AM, Karsten Loesing wrote:
> Hi!
> On 2019-12-02 20:37, Mike Perry wrote:
>> In particular, I would like to look at 8 hour snapshots from 8/5 to
>> 8/15, broken out by relay flags, of CDF-TTFB and CDF-DL from
>> https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics
> This is a very interesting experiment, and I'd like to help with the
> analysis, also in preparation of providing better tools for future
> analyses like this one.
> However, I'm having trouble understanding what graphs you have in mind
> here. I made a very first graph, even though I think it's just the
> beginning of an interactive process (that should probably not happen on
> this mailing list but on a ticket).
> https://people.torproject.org/~karsten/onionperf-cdf-ttfb-2019-12-19.pdf
> Some thoughts:
>  - I'm not sure if CDFs are the best way to visualize the data here.
> CDFs work great for visualizing *one* distribution or for comparing a
> handful of distributions when plotting them as separate lines. But we
> have a few dozen distributions here, and we can't plot all these lines
> into a single coordinate system.

Well because this graph set is just one metric, these are all the same
distribution with the same coordinates. The main problem we appear to be
having is clipping/scale.

If we got the axis clipping to be reasonable, we could conceivably just
overlay everything on fewer graphs. Perhaps all of the graphs from the
same times of day could be combined as one overlayed pile of CDF lines,
and we could use colors to represent which CDF lines were for "ON" vs
"OFF" for the experiment dates. Ie: green CDF lines for Aug 9 - Aug 13
and red CDF lines for all other dates for this experiment.

>  - Maybe we could try out different subsets of percentiles in a common
> line plot to see how the experiment affects TTFB in this case. Something
> like five-number summary or seven-number summary or whichever handful of
> percentiles we want to see.

Worst case, I am fine with having a page of graphs for each metric.
We're not bound by publication lengths when we do our own analysis.
However, if we can find a compact representation we like that has all of
this info, that will be helpful for academia. So it is worthwhile to

If we lose resolution by taking a few percentile ranges, we may miss out
on some lumpiness that means something though. So I would like to
reserve that as last resort (or perhaps as a way to capture just the
long tail past 99%).

>  - I didn't understand your idea to break out data by relay flags. These
> are requests over three-hop circuits. How would we split these up by
> relay flags?

Ah you are right. The flag separation only makes sense for the balancing
metrics. Sorry for the confusion.

>  - This graph shows just the data from a single OnionPerf source. If we
> add multiple sources, it gets even more overloaded. But we can't really
> mix numbers from different sources, as they have their very own
> connection characteristics that would skew results.
> Happy to make more graphs. It might help to see a sketch or longer
> description of what you expect to see. Thanks!

Hrm, I am not a data visualization expert, but what is most important
for us to understand is the nature of the variance of performance,
including the length of the long tail.

From your above plots, it looks like the experiment primarily negatively
impacted the long tail of perf, and maybe even 95-99% perf, but not
average perf. But I agree, even this much is hard to tell due to the
scale needed to display the full tail in CDF form. Perhaps this means a
clip at like 5-10 seconds for all graphs, to keep the X axis the same
length, and then some additional way to quantify the length and quantity
of the tail beyond the clip.

Basically, we want to be able to see if 0-99% CDF slope became wider or
got additional lumps, and we want to see if the 1% tail got longer or
shorter (and ideally also check if it has similar membership and data
points over time in terms of participant relays and time values, for
bug-hunting analysis).

We should definitely play around with a few different graphing methods
though, to compare various ways of capturing this info.

Mike Perry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-scaling/attachments/20200113/64bdf000/attachment.sig>

More information about the tor-scaling mailing list