<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><br class=""><br class=""><blockquote type="cite" class="">On Jan 13, 2020, at 9:23 PM, Mike Perry <<a href="mailto:mikeperry@torproject.org" class="">mikeperry@torproject.org</a>> wrote:<br class=""><br class=""><blockquote type="cite" style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class="">- This graph shows just the data from a single OnionPerf source. If we<br class="">add multiple sources, it gets even more overloaded. But we can't really<br class="">mix numbers from different sources, as they have their very own<br class="">connection characteristics that would skew results.<br class=""><br class="">Happy to make more graphs. It might help to see a sketch or longer<br class="">description of what you expect to see. Thanks!<br class=""></blockquote><br class="">Hrm, I am not a data visualization expert, but what is most important<br class="">for us to understand is the nature of the variance of performance,<br class="">including the length of the long tail.<br class=""><br class="">From your above plots, it looks like the experiment primarily negatively<br class="">impacted the long tail of perf, and maybe even 95-99% perf, but not<br class="">average perf. But I agree, even this much is hard to tell due to the<br class="">scale needed to display the full tail in CDF form. Perhaps this means a<br class="">clip at like 5-10 seconds for all graphs, to keep the X axis the same<br class="">length, and then some additional way to quantify the length and quantity<br class="">of the tail beyond the clip.<br class=""><br class="">Basically, we want to be able to see if 0-99% CDF slope became wider or<br class="">got additional lumps, and we want to see if the 1% tail got longer or<br class="">shorter (and ideally also check if it has similar membership and data<br class="">points over time in terms of participant relays and time values, for<br class="">bug-hunting analysis).<br class=""><br class="">We should definitely play around with a few different graphing methods<br class="">though, to compare various ways of capturing this info.<br class=""></blockquote><br class=""><div class="">I suggest that you abandon the CDFs, and use boxplots instead!</div><div class=""><br class=""></div><div class="">The y-axis can show the download time, and the x-axis can have one box per time period (moving to the right one spot means you move to the next time period and a new box). Each box encodes the CDF from that day, except as a boxplot that shows the 1st-3rd quartiles as the box, the error bars can extend from 0 - 99% percentiles, and you can add in the median and mean, and you could even show the outliers above the 99th percentile if you want.</div><div class=""><br class=""></div><div class="">The boxplots will allow you to get a sense of the range of the distribution, and also the skew.</div><div class=""><br class=""></div><div class="">I have not done these in R, but I've attached an example from python-matplotlib, which can also be found in Figure 5(c) in a recent paper:</div><div class=""><a href="https://www.robgjansen.com/publications/pointbreak-sec2019.pdf" class="">https://www.robgjansen.com/publications/pointbreak-sec2019.pdf</a></div><div class=""><br class=""></div><div class="">(In my case I was varying the number of attack circuits in each box, but I hope you get the idea.)</div><div class=""><br class=""></div><div class="">Peace, love, and positivity,</div><div class="">Rob</div><div class=""><img apple-inline="yes" id="411FE9E8-63B0-433A-9F7D-D52D774363EE" src="cid:8477F426-D84C-4F1F-8430-8C58C25410B1" class=""></div></body></html>