On Mon, Dec 07, 2015 at 05:43:01PM -0600, Tom Ritter wrote:
On 7 December 2015 at 13:51, Philipp Winter phw@nymity.ch wrote:
I spent some time improving the existing relay uptime visualisation [0]. Inspired by a research paper [1], the new algorithm uses single-linkage clustering with Pearson's correlation coefficient as distance function. The idea is that relays are grouped next to each other if their uptime (basically a binary sequence) is highly correlated. Check out the following gallery. It contains monthly relay uptime images, dating back to 2007: https://nymity.ch/sybilhunting/uptime-visualisation/
If you aren't familiar with this type of visualisation: Every image shows the uptime of all Tor relays that were online in a given month. Every row is a consensus and every column is a relay. White pixels mean that a relay was offline and black pixels means that a relay was online. Red pixels are used to highlight suspiciously similar clusters.
That's really cool. It seems to imply that the majority of the tor network stop operating halfway through the month though... Do the other tor graphs take into account hibernating relays? For example, I would expect the time-to-download graph would be somewhat affected: https://metrics.torproject.org/torperf.html?graph=torperf&start=2015-10-...
What I forgot to mention: In all diagrams, I removed relays that were always online, because an all-online uptime sequence isn't useful to find Sybils. In Nov 2015, for example, we had 10,984 unique relays by fingerprint and 3,202 (29%) were always online, and are not shown in the visualisation.
Also, here are the steps to reproduce:
wget https://collector.torproject.org/archive/relay-descriptors/consensuses/conse... tar xvJf consensuses-2015-11.tar.xz go get git.torproject.org/user/phw/sybilhunter.git sybilhunter -data consensuses-2015-11/ -uptime
Cheers, Philipp