Re: [tor-dev] Sybil attack detection (was: Karsten's July 2014)

6 Aug 2014


      On Tue, Aug 05, 2014 at 11:24:32AM -0400, Philipp Winter wrote:
...
On Tue, Aug 05, 2014 at 11:37:45AM +0200, Karsten Loesing wrote:
...
Started looking into better algorithms to detect Sybil attacks on the
Tor network.  Current thinking is that we should define relay similarity
metrics like common IP address prefix length or time between first seen
in a consensus, go throw the consensus archives, and see which relays
look similar but are not defined to be in the same family.
Do you already have some code or more details on that?  I'm quite
interested in this topic and I'm wondering if it wouldn't be best to
start with something simple like cosine similarity [0].  We would have
to transform a relay descriptor into a numerical vector and then
calculate the cosine similarity to other relay vectors.  However, this
might scale poorly as we would need (n^2)/2 comparisons.
We might also want to weigh the vector's elements differently as some of
them are easy to control for an attacker (advertised bandwidth, uptime,
flags, exit policy) and others require more effort (IP address, ASN,
observed bandwidth).  Like you mentioned, the key thing to look at might
be time, i.e., uptime and derived features such as "total online time in
last month" etc.
[0] https://en.wikipedia.org/wiki/Cosine_similarity
You only need O(n*log(n)) if you can define any similarity metric that
respects the triangle inequality.  There's a lot of research on data
structures for this:

https://en.wikipedia.org/wiki/Metric_tree

-- 
Andrea Shepard
<andrea@torproject.org>
PGP fingerprint (ECC): BDF5 F867 8A52 4E4A BECF  DE79 A4FF BC34 F01D D536
PGP fingerprint (RSA): 3611 95A4 0740 ED1B 7EA5  DF7E 4191 13D9 D0CF BDA5