[tor-bugs] #33258 [Metrics/Onionperf]: Add CSV file export of graphed data

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed May 20 22:18:44 UTC 2020


#33258: Add CSV file export of graphed data
---------------------------------------+--------------------------------
 Reporter:  karsten                    |          Owner:  karsten
     Type:  enhancement                |         Status:  needs_review
 Priority:  Medium                     |      Milestone:
Component:  Metrics/Onionperf          |        Version:
 Severity:  Normal                     |     Resolution:
 Keywords:  metrics-team-roadmap-2020  |  Actual Points:  1
Parent ID:  #33327                     |         Points:  3
 Reviewer:  acute                      |        Sponsor:  Sponsor59-must
---------------------------------------+--------------------------------

Comment (by acute):

 I've reviewed this using Pandas version 0.25.3+dfsg-7 and Seaborn version
 0.10.0-1. I think the code is ready to be merged, more details below.

 Overall:

   * The plots look significantly better!

   * The code is more readable and maintainable.

   * Analyses now take less time; the difference can appear small for
 single files (~0.5 seconds), but becomes significant (in the order of
 seconds) when analysing multiple data sets (e.g, 14s vs 11s when plotting
 two days' worth of data in two data sets). This was measured with 'time'
 over the same data.

 Plots:
 Here I tried to determine the correspondence between the old graphs and
 the new ones, and compared the arrays of values produced by each of the
 versions of the code.

   * ECDFs "Time to download first byte" and "Time to download last of {
 51200, 1048576, 5242880 } bytes" are equivalent to the old versions. The
 output of values produced is the same between the versions, with the small
 exception Karsten mentions above (the lowest and highest values are added
 to extend the cdf lines)

   * Plots "Time to download { first, last } of { 51200, 1048576, 5242880 }
 bytes over time" are equivalent to "mean time to download {first, last} of
 { 51200, 1048576, 5242880 }". The output of values produced is the same
 between the two versions. A scatter plot is a much better choice for this
 data, there is no reason to have the dots connected.

   * Plots:
      - "Time to download last of { 51200, 1048576, 5242880 } bytes"
 (boxplot)
      - "Mean time to download last of { 51200, 1048576, 5242880 } bytes"
      - "Number of downloads of { 51200, 1048576, 5242880 } bytes
 completed"

    These look far better than their old counterparts, and the values
 correspond between the versions. I think they could do with slightly
 better resolution (perhaps more ticks?) in the y axis in some cases - for
 example, looking at "Mean time to download last of 1048576 bytes" in the
 last pdfs attached to this ticket, I can only tell the value was above 4
 for "2019-01-31-ab" and somewhere between 2 and 3 for "2019-01-30-nl". I
 get to 2.8 and 4.4 immediately looking at the old version of the graph. It
 would be nice to get the number at a glance without having to dig for it
 in the data (perhaps this was also the reason for the "Max time to
 download..." plots), but this might not scale well when plotting multiple
 datasets.
    * "Downloads failed over time" and "Number of downloads failed" - I
 agree with the choice of replacing the other error plots in the old
 versions, these are very readable and a definite improvement over the old
 ones!

 Other thoughts/TODOs:
   * We depend on two new libraries, both of which are well supported and
 documented. We should add those to our setup requirements, and update the
 installation instructions in our documentation.

   * We don't seem to have a style guide for python in metrics-team.
 Personally, I use yapf (tool I also maintain in Debian) to format code
 according to pep8 guidelines, and also pylint. Perhaps this is something
 to look into/document.

   * I do get the following warning with version 0.25.3 of pandas when
 running the code, which suggests a change for the future:
 {{{
 /usr/lib/python3/dist-
 packages/pandas/plotting/_matplotlib/converter.py:103: FutureWarning:
 Using an implicitly registered datetime converter for a matplotlib
 plotting method. The converter was registered by pandas on import. Future
 versions of pandas will require you to explicitly register matplotlib
 converters.

 To register the converters:
         >>> from pandas.plotting import register_matplotlib_converters
         >>> register_matplotlib_converters()
   warnings.warn(msg, FutureWarning)
 }}}

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33258#comment:17>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list