[tor-dev] Visualising similarities between relay descriptors

Sun May 31 15:14:44 UTC 2015

Visualising the similarity between two Tor relay descriptors helps with
finding Sybil attacks.  I added code to sybilhunter [0] that takes as
input relay descriptors, determines all (n^2)/2 pairwise similarities,
and outputs DOT code (part of Graphviz) that illustrates relay clusters
and what makes them similar.  For now, this functionality is just a set
of hard-coded rules that determine, e.g.:

- Do the relays have the same, non-default exit policy?
- Do the relays have a similar uptime?
- Do the relays run on the same platform?

To give you an idea of what this looks like, I took all relay
descriptors archived by CollecTor [1] for 2015-05-30 and calculated
similarities by running:

  $ sybilhunter -data 2015-05-30/ -cumulative -matrix -threshold 6 -visualise > sim.dot
  $ dot -o sim.svg -Tsvg sim.dot

The resulting graph is online [2].  Vertices are relays (nickname in the
first line, followed by the first eight hex digits), and the edge labels
show the similarities.

Unsurprisingly, there are several relay clusters that probably should be
in a family, but aren't.  For example, the "startor*", "torpids*",
"manningsnowden*", and "Montharkan*" relays.

There are, however, also several relay clusters, often named "default",
that share the first two hex digits of their fingerprint.  This is
unlikely to be a coincidence, so they might have wanted to position
themselves in the DHT.

Please let me know if you have any suggestions on how to improve the
tool or its visualisation.

[0] <https://gitweb.torproject.org/user/phw/sybilhunter.git/>
[1] <https://collector.torproject.org/recent/relay-descriptors/server-descriptors/>
[2] <https://www.nymity.ch/sybilhunting/svg/2015-05-30_similarities.svg>

Cheers,
Philipp