[tor-dev] Fwd: Re: Can we stop sanitizing nicknames in bridge descriptors?

Karsten Loesing karsten at torproject.org
Tue May 22 07:24:41 UTC 2012

Forwarding my original answer to Sebastian here.

-------- Original Message --------
Subject: Re: [tor-dev] Can we stop sanitizing nicknames in bridge
Date: Mon, 21 May 2012 19:56:34 +0200
From: Karsten Loesing <karsten at torproject.org>
To: Sebastian G. <bastik.tor> <bastik.tor at googlemail.com>

Hi Sebastian,

On 5/21/12 7:08 PM, Sebastian G. <bastik.tor> wrote:

(Did you intend to send this mail only to me, not to tor-dev?  Feel free
to move the discussion back to tor-dev if you want.)

> Karsten Loesing, 21.05.2012 11:05:
>>> Here we go with the similarities of bridge and relay nicknames.
>> Thanks for spending this much time on the analysis!
> I could have done far worse, but also a lot better in terms of time
> spend on extracting the data that I wanted or at least considered that
> they'd might be useful.
> Sometimes I'm just slow at things, e.g. writing this reply.
>> Here's what I did with your findings.txt:
>> - extract unique fingerprint pairs of relays and bridges that you found
>> as having similar nicknames,
>> - look through descriptor archives to see if relay and bridge were
>> running in the same /24 at any time in May 2008, and
>> - determine the absolute and relative number of bridges in a given
>> network status that could have been located via nickname similarity.
>> Results are that 24 of your 81 guesses (30%) were correct in the sense
>> that a bridge was at least once running in the same /24 as the relay
>> with similar nickname.  At any time in May 2008, you'd have located
>> between 1 and 6 bridges (2.5% to 18%) with 3 bridges (10%) in the mean
>> via nickname similarity.
> Not too bad.

I agree. :)

>> I think it's acceptable to publish more recent bridge descriptors with
>> nicknames in a week from now.  Results may look quite different with
>> 1000 bridges instead of 30.
> May 2008 was the first month with bridges. I expected lot's of relay
> operators that tested a bridge with the same name. Things may have
> changed over time. I assume that further comparisons won't have such a
> "high" hit ratio.

That would be my guess, too.  In May 2008, only a few early adopters
were running bridges, and most of those probably ran relays at the same
time, too.  Plus, they were enthusiastic and put some energy in finding
cool nicknames.  It might be that this has changed since then.  To be
honest, I didn't look at 2012 tarballs yet.

>> Again, thanks for running this analysis!  Maybe you're interested in
>> automating your comparison and re-running it for a 2012 tarball?
> My claim was you got the data, so you can check. (Not with May 2008)
> To be honest, my first impression was that I wouldn't do anything useful
> and did not intend to do that. I guessed it wouldn't turn out that it
> doesn't hurt since at least 2011, so I wouldn't find anything good.
> Then you asked and I agreed, but already thought "I couldn't keep my
> mouth shut!". I mean I replied to this topic. I surely could have said
> no there. I didn't.
> After and while I was doing what I did. I would have said no to the
> question if I'm going to do this again. That's valid for up to Sunday
> night. Today I'm agreeing again.
> That's a pretty long way to say: Yes!

Hah, great! :)

I'm going to make the 2012 tarballs available next Wednesday (May 30),
assuming that my poor Linux box doesn't run out of $resource.  I'll let
you know.

> Thank you,it's an 2012 tarball. The number of bridges is scary.
> I'm going to upload some files somewhere and explain what I did. Step by
> step (somewhat around that). So anyone can check and reproduce what I
> did. It would be nice to hear feedback and ways to improve the way I did
> what I did.
> Maybe you can tell me if the findings.txt was alright.

Yes, the file format was fine.

> Unless one objects or you disagree I'm going to upload the files I
> created and explain how and maybe I can say even why.

No objections at all.  Open discussion is good.

> I created a Blog, just because I wanted it some when in the past, but
> found it silly. That's the channel I planed to use. Maybe it's OK to put
> it on a Tor-List as well, but maybe it's considered as noise.

I wonder if the Tor wiki would be a better place to collect ideas for
reversing the bridge descriptor sanitizing process.  Feel free to grab a
new page in doc/ and start describing what you did.


More information about the tor-dev mailing list