[tor-dev] Can we stop sanitizing nicknames in bridge descriptors?

Wed Jun 6 16:53:22 UTC 2012

Karsten Loesing:
> On 6/4/12 7:43 PM, Sebastian G. <bastik.tor> wrote:
>> Karsten Loesing:
>>> On 5/16/12 8:47 AM, Karsten Loesing wrote:
>>>> On 5/2/12 2:30 PM, Karsten Loesing wrote:
>>>>> If nobody objects within the next, say, two weeks, I'm going to make an
>>>>> old tarball from 2008 available with original nicknames.  And if nobody
>>>>> screams, I'll provide the remaining tarballs containing original
>>>>> nicknames another two weeks later.
>>>>
>>>> Here we go.  These are the sanitized bridge descriptors from May 2008
>>>> including original bridge nicknames:
>>>>
>>>> http://freehaven.net/~karsten/volatile/bridges-2008-05-nicknames.tar.bz2
>>>
>>> And now, two weeks later, here are the sanitized bridge descriptors
>>> containing nicknames:
>>>
>>> https://metrics.torproject.org/data.html#bridgedesc
>>>
>>> Best,
>>> Karsten
>>
>> Here are my findings for the tarballs of March 2012. I could pick freely
>> from any 2012 tarball. I picked March 2012 because it contained the
>> "bridge peak" and the relays seemed stable.
> 
> Results are that 205 of your 308 guesses (66%) were correct in the sense
> that a bridge was at least once running in the same /24 as the relay
> with similar nickname.  At any time in March 2012, you'd have located
> between 26 and 46 bridges (1.7% to 3.3%) with 37 bridges (2.5%) in the
> mean via nickname similarity.

That sounds good from my point of view as an attacker. It's not too bad.

> Your accuracy went up from 30% in your May 2008 analysis to 66%, but the
> overall fraction of bridges you'd have located went down from 10% to
> 2.5% in the mean.

>From my point of view as a user it's good that the overall fraction
decreased.

> I think we can live with an adversary being able to locate 1 out of 40
> bridges if the operator assigns a similar nickname and runs it on a
> nearby IP address.

You should get more people to run bridges with names of already existing
relays that are not their own. That would give a higher false-positive
rate. (True, it would but I'm just kidding)

More bridges overall would be wonderful.

> If you think you can come up with a vastly improved rate of located
> bridges of, say, 5% or more, I can look at another findings.txt of yours
> for a different month than March 2012.

Unless I can come up with an idea to exclude false-positives that's
unlikely. Well I went up from 30% to 66%, but I don't know why that was
the case.

> If not, let's conclude this analysis and assume that publishing bridge
> nicknames is safe enough---until somebody shows us that we're wrong.

I consider that publishing bridge nicknames is safe enough for achieving
the goals (counting EC2, searching them via Atlas), unless somebody
(myself not excluded) shows us that we're wrong.

> Again, thanks for running this analysis!

Thank you for your work. I did this because,
a) I had the idea to look at the data
b) you told me it's useful
c) I wanted to know how many can be located.

Finally I can say that it was a fine experience and I learned something
(at least about processing the data).

> Thanks,
> Karsten
> 

Once again, thank you.

Best,
Sebastian