[tor-dev] Scaling Tor Metrics, Round 2

Karsten Loesing karsten at torproject.org
Wed Dec 9 13:52:55 UTC 2015

Hash: SHA1

On 07/12/15 17:31, David Goulet wrote:
> On 06 Dec (16:52:45), Karsten Loesing wrote:
>> Hi everyone,
> [snip]
>> One important and still somewhat low-hanging fruit is: #10 give 
>> external developers more support when developing visualizations
>> that could later be added to Metrics.  This requires better
>> documentation, but it also requires making it easier to install
>> Tor Metrics locally and test new additions before submitting
>> them.  The latter is a good goal, but we're not there yet.  The
>> documentation part doesn't seem crazy though.  David, if you
>> don't mind being the guinea pig yet once more, I'd want to try
>> this out with your latest visualizations.  This is pending on the
>> JavaScript decision though.
> The current viz. I have are all generated by a Munin server which
> every 5 minutes collect data points on the "munin node" and
> generates graph (PNG). So as a client accessing the server, you
> only have to fetch a PNG, all the CPU work for the graph is done on
> the server side.
> It's indeed the JS vs non JS discussion where you basically want to
> put the load on the client side instead of the server.
> Please expand on the what would be required of me for this guinea
> pig experiment? :)

Hi David,

the following description only applies if you want your visualization
to be part of Metrics.  We could also start by adding them as
"external" visualizations by linking to your server.  But let me
expand on the scenario where you want it to be part of Metrics.

The Munin model that you describe sounds very simple, but it lacks an
important property: users cannot modify graphs other than picking
graphs for different time periods.  The graphs on Metrics don't have
this limitation, but of course that doesn't come for free.

All graphs on Metrics require two parts: the first part aggregates
data every 24 hours that is sufficient to produce any graphs you'd
want users to be able to create, and the second part draws new graphs
based on user input.

So, we'll have to split up your code into one part that produces a
.csv file (or related format) and another part that draws graphs.

There are very few requirements for writing the first part of the
code.  I'm calling that code a data-aggregating module in Metrics.
Maybe look at a module that I wrote quite recently:


Note that this code could also be written in Python using Stem for
descriptor parsing.  As long as cron can call it on the command line,
all is good.  Of course, it shouldn't require endless amounts of
memory, and it should ideally be done within minutes, but we could
talk about those requirements.  Another requirement is that it's
usable enough that somebody who didn't write the code can run it and
fix the most trivial problems.

Ideally, you'd be able to re-use 90% of your current code for this
first part.  The only different is that you wouldn't have Munin to
produce a graph, but that you'd output a .csv file with what go into
Munin's graphs.

So, there's less flexibility about the second part of the code that
generates graphs.  If we want to use the current graphing engine,
we'll have to write some R/ggplot2 code and extend the Java
servlets/JSPs.  This is not crazy talk, but it's probably going to
require half a day or more of a metrics person.

If we had a decision to switch to JavaScript, that would be different.
 In that case you could write the graphing code using D3.js, test it
locally, and once you like it, we'd copy it over to Metrics.  But
we're not there yet, nor do I know how fast we're moving forward
there.  That's why I'd suggest going with R/ggplot2.  This code is
probably written quite fast.

What do you think?  Want to give this a try, maybe starting with your
favorite visualization?

All the best,

Comment: GPGTools - http://gpgtools.org


More information about the tor-dev mailing list