[tor-reports] Philipp's March 2015
phw at torproject.org
Tue Mar 31 13:41:14 UTC 2015
- I filed bug 15178 for Atlas .
- I put more code and thought into the Sybil detector . One aspect
of it is to find a distance metric that can quantify the similarity
between two given relay descriptors. Once we have that, we can use
nearest neighbour search algorithms to find the k relays that are the
most similar to a given relay descriptor.
So far, I experimented with the Levenshtein distance  as distance
metric and a vantage point tree  as nearest neighbour search
algorithm. My current application of the Levenshtein distance is not
very smart as it simply takes two raw relay descriptors as input and
determines their distance. The distance is the amount of string
manipulations necessary to turn descriptor A into descriptor B.
I'm currently looking at ways to preprocess the relay descriptors so
we can incorporate our experience with past Sybils instead of just
treating them as opaque string blurbs. The challenging part is that
the result still has to be a metric in the mathematical sense that
satisfies a number of properties.
More information about the tor-reports