Better relay uptime visualisation

Philipp Winter
Mon Dec 7 19:51:23 UTC 2015

I spent some time improving the existing relay uptime visualisation [0].
Inspired by a research paper [1], the new algorithm uses single-linkage
clustering with Pearson's correlation coefficient as distance function.
The idea is that relays are grouped next to each other if their uptime
(basically a binary sequence) is highly correlated.  Check out the
following gallery.  It contains monthly relay uptime images, dating back
to 2007:

If you aren't familiar with this type of visualisation: Every image
shows the uptime of all Tor relays that were online in a given month.
Every row is a consensus and every column is a relay.  White pixels mean
that a relay was offline and black pixels means that a relay was
online.  Red pixels are used to highlight suspiciously similar clusters.
A nice example is the Heartbleed incident:
The huge red block on the left shows all the relays that were removed by
the directory authorities because they didn't rotate their key pairs in

The downside of single-linkage clustering is that it takes longer to
compute.  On my laptop, I can create an image covering one month in
under three minutes, so it's tolerable.

Another practical problem is that it's cumbersome to learn the relay
fingerprint of a given column.  I'm looking into JavaScript/HTML tricks
that can show text when you hover over a region in the image.  Perhaps
somebody knows more?

[0] <https://bugs.torproject.org/12813>
[1] <http://nms.csail.mit.edu/papers/clustering-imw2002.pdf>, Section 2


