[tor-dev] Better relay uptime visualisation

Mon Dec 7 19:51:23 UTC 2015

I spent some time improving the existing relay uptime visualisation [0].
Inspired by a research paper [1], the new algorithm uses single-linkage
clustering with Pearson's correlation coefficient as distance function.
The idea is that relays are grouped next to each other if their uptime
(basically a binary sequence) is highly correlated.  Check out the
following gallery.  It contains monthly relay uptime images, dating back
to 2007:
<https://nymity.ch/sybilhunting/uptime-visualisation/>

If you aren't familiar with this type of visualisation: Every image
shows the uptime of all Tor relays that were online in a given month.
Every row is a consensus and every column is a relay.  White pixels mean
that a relay was offline and black pixels means that a relay was
online.  Red pixels are used to highlight suspiciously similar clusters.
A nice example is the Heartbleed incident:
<https://nymity.ch/sybilhunting/uptime-visualisation/slide_2014-04.html>
The huge red block on the left shows all the relays that were removed by
the directory authorities because they didn't rotate their key pairs in
time.

The downside of single-linkage clustering is that it takes longer to
compute.  On my laptop, I can create an image covering one month in
under three minutes, so it's tolerable.

Another practical problem is that it's cumbersome to learn the relay
fingerprint of a given column.  I'm looking into JavaScript/HTML tricks
that can show text when you hover over a region in the image.  Perhaps
somebody knows more?

[0] <https://bugs.torproject.org/12813>
[1] <http://nms.csail.mit.edu/papers/clustering-imw2002.pdf>, Section 2

Cheers,
Philipp