[metrics-bugs] #28799 [Metrics/Website]: Use R.cache to speed up drawing graphs

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Dec 14 13:24:23 UTC 2018


#28799: Use R.cache to speed up drawing graphs
-----------------------------+-----------------------------------
 Reporter:  karsten          |          Owner:  karsten
     Type:  enhancement      |         Status:  needs_information
 Priority:  Medium           |      Milestone:
Component:  Metrics/Website  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:                   |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+-----------------------------------

Comment (by karsten):

 Replying to [comment:2 notirl]:
 > This commit looks OK. I'm not sure about the approach though. We had
 talked about using the same CSVs for these graphs as we make available for
 download so that we don't have two different CSVs and it is easier to plot
 custom graphs using our code as a starting point.
 >
 > For the graphs that I've been making for various requests I've been
 using the readr library which works nicely with the tidyr universe of
 packages. What would the performance impact be of reading the CSVs from a
 ramdisk instead of caching them in R?

 That's an interesting idea. Couple thoughts:

  - Where and when would we write the per-graph CSV files that would then
 become the starting point for graphs and partial CSV file exports?
    - If we use R for this, the code will be rather simple, but we'd still
 have an R part in our daily updater which we're currently trying to make
 Java-only.
    - We could execute some R code to write per-graph CSV files when
 starting Rserve, but we'd have to re-run it whenever the daily updater has
 finished. Sounds like it could get messy.
    - If we move this code to Java, we might want to look into statistics
 libraries to do something similar like what tidyr/dplyr does. The current
 approach with Java Collections classes is a bit limited.
  - The ramdisk sounds like it would be just as fast as the cache I'm
 suggesting. But how would we make sure it always has the most recent data,
 including after reboots?

 Happy to discuss this more!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28799#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list