<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr">Hi,<br><div><br></div></div><div dir="ltr"><blockquote type="cite">On 22 Jan 2020, at 01:56, Karsten Loesing <karsten@torproject.org> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><span>On 2020-01-14 03:23, Mike Perry wrote:</span><br><blockquote type="cite"><span>On 12/19/19 4:11 AM, Karsten Loesing wrote:</span></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>On 2019-12-02 20:37, Mike Perry wrote:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>In particular, I would like to look at 8 hour snapshots from 8/5 to</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>8/15, broken out by relay flags, of CDF-TTFB and CDF-DL from</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>This is a very interesting experiment, and I'd like to help with the</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>analysis, also in preparation of providing better tools for future</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>analyses like this one.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>However, I'm having trouble understanding what graphs you have in mind</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>here. I made a very first graph, even though I think it's just the</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>beginning of an interactive process (that should probably not happen on</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>this mailing list but on a ticket).</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>https://people.torproject.org/~karsten/onionperf-cdf-ttfb-2019-12-19.pdf</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Some thoughts:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span> - I'm not sure if CDFs are the best way to visualize the data here.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>CDFs work great for visualizing *one* distribution or for comparing a</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>handful of distributions when plotting them as separate lines. But we</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>have a few dozen distributions here, and we can't plot all these lines</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>into a single coordinate system.</span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Well because this graph set is just one metric, these are all the same</span><br></blockquote><blockquote type="cite"><span>distribution with the same coordinates. The main problem we appear to be</span><br></blockquote><blockquote type="cite"><span>having is clipping/scale.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>If we got the axis clipping to be reasonable, we could conceivably just</span><br></blockquote><blockquote type="cite"><span>overlay everything on fewer graphs. Perhaps all of the graphs from the</span><br></blockquote><blockquote type="cite"><span>same times of day could be combined as one overlayed pile of CDF lines,</span><br></blockquote><blockquote type="cite"><span>and we could use colors to represent which CDF lines were for "ON" vs</span><br></blockquote><blockquote type="cite"><span>"OFF" for the experiment dates. Ie: green CDF lines for Aug 9 - Aug 13</span><br></blockquote><blockquote type="cite"><span>and red CDF lines for all other dates for this experiment.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span> - Maybe we could try out different subsets of percentiles in a common</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>line plot to see how the experiment affects TTFB in this case. Something</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>like five-number summary or seven-number summary or whichever handful of</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>percentiles we want to see.</span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Worst case, I am fine with having a page of graphs for each metric.</span><br></blockquote><blockquote type="cite"><span>We're not bound by publication lengths when we do our own analysis.</span><br></blockquote><blockquote type="cite"><span>However, if we can find a compact representation we like that has all of</span><br></blockquote><blockquote type="cite"><span>this info, that will be helpful for academia. So it is worthwhile to</span><br></blockquote><blockquote type="cite"><span>brainstorm.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>If we lose resolution by taking a few percentile ranges, we may miss out</span><br></blockquote><blockquote type="cite"><span>on some lumpiness that means something though. So I would like to</span><br></blockquote><blockquote type="cite"><span>reserve that as last resort (or perhaps as a way to capture just the</span><br></blockquote><blockquote type="cite"><span>long tail past 99%).</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span> - I didn't understand your idea to break out data by relay flags. These</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>are requests over three-hop circuits. How would we split these up by</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>relay flags?</span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Ah you are right. The flag separation only makes sense for the balancing</span><br></blockquote><blockquote type="cite"><span>metrics. Sorry for the confusion.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span> - This graph shows just the data from a single OnionPerf source. If we</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>add multiple sources, it gets even more overloaded. But we can't really</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>mix numbers from different sources, as they have their very own</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>connection characteristics that would skew results.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Happy to make more graphs. It might help to see a sketch or longer</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>description of what you expect to see. Thanks!</span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Hrm, I am not a data visualization expert, but what is most important</span><br></blockquote><blockquote type="cite"><span>for us to understand is the nature of the variance of performance,</span><br></blockquote><blockquote type="cite"><span>including the length of the long tail.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>From your above plots, it looks like the experiment primarily negatively</span><br></blockquote><blockquote type="cite"><span>impacted the long tail of perf, and maybe even 95-99% perf, but not</span><br></blockquote><blockquote type="cite"><span>average perf. But I agree, even this much is hard to tell due to the</span><br></blockquote><blockquote type="cite"><span>scale needed to display the full tail in CDF form. Perhaps this means a</span><br></blockquote><blockquote type="cite"><span>clip at like 5-10 seconds for all graphs, to keep the X axis the same</span><br></blockquote><blockquote type="cite"><span>length, and then some additional way to quantify the length and quantity</span><br></blockquote><blockquote type="cite"><span>of the tail beyond the clip.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Basically, we want to be able to see if 0-99% CDF slope became wider or</span><br></blockquote><blockquote type="cite"><span>got additional lumps, and we want to see if the 1% tail got longer or</span><br></blockquote><blockquote type="cite"><span>shorter (and ideally also check if it has similar membership and data</span><br></blockquote><blockquote type="cite"><span>points over time in terms of participant relays and time values, for</span><br></blockquote><blockquote type="cite"><span>bug-hunting analysis).</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>We should definitely play around with a few different graphing methods</span><br></blockquote><blockquote type="cite"><span>though, to compare various ways of capturing this info.</span><br></blockquote><span></span><br><span>Alright, I made a set of new graphs, taking your comments above into</span><br><span>account:</span><br><span></span><br><span>https://people.torproject.org/~karsten/onionperf-cdf-ttfb-2020-01-21.pdf</span><br><span></span><br><span>I included explanations and thoughts on these graphs in the graph</span><br><span>captions. Not what captions were made for, but it seemed useful to have</span><br><span>these ideas close to the graphs in this case.</span><br></div></blockquote><br><div>In the caption on page 8, Karsten writes:</div><div>> And it reveals something interesting around the 50th percentile or</div><div>> 2 seconds mark. Whatever happens there shouldn't be happening.</div><div><br></div><div>Tor 0.2.9 does a lot of things on a 1-second timer, and I'm pretty sure</div><div>that some of them would delay circuit building. Since 0.2.9, we've moved</div><div>a lot of functions to "every event loop" (a few hundred milliseconds) or</div><div>"as needed". But 0.2.9 is only 10% of the network, so it shouldn't be</div><div>affecting 50% of the circuits, even if they are 3-hop circuits:</div><div><a href="https://metrics.torproject.org/versions.html">https://metrics.torproject.org/versions.html</a></div><div>(We should check consensus weight rather than relay count. But as far</div><div>as I remember, it's also around 10% for 0.2.9.)</div><div><br></div><div>Tor still does a few things on a 1 second timer, we should check if any of</div><div>those things delay circuit building. In particular, I think the</div><div>RelayBandwidthRate token buckets are still refilled every second.</div><div><br></div><div>T</div></body></html>