[tor-scaling] Nov 20 meeting recap: metrics analysis needed
mikeperry at torproject.org
Tue Jan 28 13:32:34 UTC 2020
On 1/21/20 7:54 AM, Karsten Loesing wrote:
> On 2020-01-14 03:23, Mike Perry wrote:
>> On 12/19/19 4:11 AM, Karsten Loesing wrote:
>>> On 2019-12-02 20:37, Mike Perry wrote:
>>>> In particular, I would like to look at 8 hour snapshots from 8/5 to
>>>> 8/15, broken out by relay flags, of CDF-TTFB and CDF-DL from
>>> This is a very interesting experiment, and I'd like to help with the
>>> analysis, also in preparation of providing better tools for future
>>> analyses like this one.
>>> However, I'm having trouble understanding what graphs you have in mind
>>> here. I made a very first graph, even though I think it's just the
>>> beginning of an interactive process (that should probably not happen on
>>> this mailing list but on a ticket).
>>> Some thoughts:
>>> - I'm not sure if CDFs are the best way to visualize the data here.
>>> CDFs work great for visualizing *one* distribution or for comparing a
>>> handful of distributions when plotting them as separate lines. But we
>>> have a few dozen distributions here, and we can't plot all these lines
>>> into a single coordinate system.
>> Well because this graph set is just one metric, these are all the same
>> distribution with the same coordinates. The main problem we appear to be
>> having is clipping/scale.
>> If we got the axis clipping to be reasonable, we could conceivably just
>> overlay everything on fewer graphs. Perhaps all of the graphs from the
>> same times of day could be combined as one overlayed pile of CDF lines,
>> and we could use colors to represent which CDF lines were for "ON" vs
>> "OFF" for the experiment dates. Ie: green CDF lines for Aug 9 - Aug 13
>> and red CDF lines for all other dates for this experiment.
>>> - Maybe we could try out different subsets of percentiles in a common
>>> line plot to see how the experiment affects TTFB in this case. Something
>>> like five-number summary or seven-number summary or whichever handful of
>>> percentiles we want to see.
>> Worst case, I am fine with having a page of graphs for each metric.
>> We're not bound by publication lengths when we do our own analysis.
>> However, if we can find a compact representation we like that has all of
>> this info, that will be helpful for academia. So it is worthwhile to
>> If we lose resolution by taking a few percentile ranges, we may miss out
>> on some lumpiness that means something though. So I would like to
>> reserve that as last resort (or perhaps as a way to capture just the
>> long tail past 99%).
>>> - I didn't understand your idea to break out data by relay flags. These
>>> are requests over three-hop circuits. How would we split these up by
>>> relay flags?
>> Ah you are right. The flag separation only makes sense for the balancing
>> metrics. Sorry for the confusion.
>>> - This graph shows just the data from a single OnionPerf source. If we
>>> add multiple sources, it gets even more overloaded. But we can't really
>>> mix numbers from different sources, as they have their very own
>>> connection characteristics that would skew results.
>>> Happy to make more graphs. It might help to see a sketch or longer
>>> description of what you expect to see. Thanks!
>> Hrm, I am not a data visualization expert, but what is most important
>> for us to understand is the nature of the variance of performance,
>> including the length of the long tail.
>> From your above plots, it looks like the experiment primarily negatively
>> impacted the long tail of perf, and maybe even 95-99% perf, but not
>> average perf. But I agree, even this much is hard to tell due to the
>> scale needed to display the full tail in CDF form. Perhaps this means a
>> clip at like 5-10 seconds for all graphs, to keep the X axis the same
>> length, and then some additional way to quantify the length and quantity
>> of the tail beyond the clip.
>> Basically, we want to be able to see if 0-99% CDF slope became wider or
>> got additional lumps, and we want to see if the 1% tail got longer or
>> shorter (and ideally also check if it has similar membership and data
>> points over time in terms of participant relays and time values, for
>> bug-hunting analysis).
>> We should definitely play around with a few different graphing methods
>> though, to compare various ways of capturing this info.
> Alright, I made a set of new graphs, taking your comments above into
> I included explanations and thoughts on these graphs in the graph
> captions. Not what captions were made for, but it seemed useful to have
> these ideas close to the graphs in this case.
> Curious to hear your thoughts!
Right after I hit send last time, I remembered color blind pallet
So instead of red, green, blue, lets try purple, green, yellow? Or some
other pallet from that page.
I agree that we might as well just put everything on one graph in
aggregate. I wanted the 8 hour breakout initially because I wanted to
try to isolate time of day as a control, but since the experiment ran
over several days, this is not necessary.
As for the data itself, I'm definitely interested in seeing CDFs from
the other onionperf instances during these time periods, especially to
see if all of them have that 50% bump, and if they have similar results
for before, during, and after, at least comparatively.
And CDF-DL graphs, too.
I filed https://trac.torproject.org/projects/tor/ticket/33076 for this
work. If you would rather iterate there, we can.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: OpenPGP digital signature
More information about the tor-scaling