[metrics-team] Integrating Consensus Health graphs onto metrics?

Karsten Loesing karsten at torproject.org
Tue May 2 08:30:50 UTC 2017


Hi Tom,

On 01.05.17 20:31, Tom Ritter wrote:
> Over in https://trac.torproject.org/projects/tor/ticket/21882 and
> https://trac.torproject.org/projects/tor/ticket/21994 we're talking
> about adding (and have added) yet more graphs to Consensus Health.
> 
> With the graphs in 21994 (and to satisfy 21883) I want to switch from
> static graphs to interactive ones.  Specifically, Consensus Health
> will have the following (interactive, meaning you choose your
> timeframe) graphs:
> - Fallback Directories Running
> - Relays voted about (Running) by DirAuth
> - Relays voted about (Not Running) by DirAuth
> - Measured Relays by BWAuth
> - [Poorly Named, Suggestions Welcome]: Relay Measurement Distribution by BWAuth
> - [Poorly Named, Suggestions Welcome]: Relay Measurement Distribution
> by BWAuth by Relay Country (To be Developed)
> - [Poorly Named, Suggestions Welcome]: BWAuth to BWAuth Variance
> Comparison by Relay Country (To be Developed)
> 
> That's a lot of graphs. The ad-hoc graph page I made needs to be
> redesign for sure. But should these live (only?) on Metrics?
> 
> Right now, Metrics doesn't have any per-dirauth graphs. Adding these
> type of graphs (which is all-but-one of the above) may be confusing.
> It also doesn't really have graphs that are specifically designed to
> be 'Network Diagnostic' - they may fit better on a page designed to be
> network diagnostic (consensus-health).

I could see how some (or all?) of these graphs would fit under the
Servers page on Tor Metrics.

> Additionally, I don't have the time to devote to learn the metrics
> framework, so I would basically push for integrating the graphs via
> the use of cooperative iframes. This would also keep the existing data
> segregation between the backend for Metrics, and the
> SQLite-generates-CSV-files-for-d3 that I do currently. (The sqlite
> backending file is available for download though, at
> https://consensus-health.torproject.org/historical.db )

I understand that you're not too keen on rewriting this code for Tor
Metrics.  But I'm not sure if I want us to take processed Tor network
data from an external source, so I think I'd rather want to add a static
image and link instead.  However, maybe we can find a compromise. :)

How about we discuss how your data would have to look like in order to
be included on Tor Metrics, and even if we later decide not to take that
last step, you'll have benefited from our experience with making Tor
network data accessible to users?  (And of course, if we do decide to
take that last step, it'll be a smaller step.)

Some quick thoughts:

Do you have a data format for the graph data behind your graphs that
scales for months or even a decade?  It looks like all your graphs end
at "past 90 days", though I'm not clear whether that's because you
didn't want to make the page even longer or whether the data file would
become too big.  In the latter case, we should discuss how to keep the
file small enough even if it contains all the data back to 2007.  Or
we'd have to come up with a good reason why this data is only relevant
for the relatively recent past, unlike all other data.

What you could do is pick a higher data resolution for quite recent data
and a lower data resolution for data in the distant past.  For example,
you could keep one data point per hour for the past week and reduce that
step by step to 4 hours, 12 hours, 24 hours, 240 hours, etc. for the
past years.  This is similar to what Onionoo does, though I could see
how we adapt that approach for CSV files containing all data for a
graph.  Graphing this data requires some tricks: if we plot data from
two different data resolutions, we'll have to process all data to have
the lower of the two data resolutions; and if we plot a very short time
interval from the distant path, we'll have to interpolate from data
points possibly outside of the plotted time interval.  But I'd be happy
to help with this, unless you'd want to do this adventure on your own.

Note that you should probably apply some smoothing to your 90-days
graphs anyway.  Your "Voted About Relays (Not Running)" graphs contain a
lot of volatility from relays joining and leaving over the day that
makes it hard to see trends or even differences between authorities.
Unless it's the outliers that you really care about to see.

By the way, if you expose your data in CSV files, you could quite easily
use some JavaScript graphing thing to make your graphs more interactive
and avoid having several graphs for different intervals of the same
data.  D3.js comes to mind.

Another requirement for adding data to Tor Metrics is that it needs to
be documented in a similar way as the other data files:

https://metrics.torproject.org/stats.html

And the graphs would have to be documented in a way that the average Tor
Metrics user can understand what's going on, at least to the point where
they can decide that they don't care about this aspect of the Tor network.

Yet another requirement for moving code to Tor Metrics is that it should
use PostgreSQL rather than SQLite and Java rather than whatever else
it's written in right now.  But if we have a data format with
documentation and working code that uses SQLite and, say, Python, I'm
quite optimistic that we can find the time to do the rest of the work.

> "Thanks but no thanks" seems like a likely outcome but I wanted to at
> least raise it for discussion....  And give people the opportunity to
> comment on how the page should be redesigned.

Thanks for asking!  Maybe some of the ideas above are useful. :)

> -tom

All the best,
Karsten


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/metrics-team/attachments/20170502/5d73d5cb/attachment.sig>


More information about the metrics-team mailing list