[tor-dev] Can we stop sanitizing nicknames in bridge descriptors?
karsten at torproject.org
Thu May 3 11:32:55 UTC 2012
[Cc'ing tor-relays, because this discussion might be relevant for
relay/bridge operators, too. Please keep the discussion on tor-dev.
for the whole thread.]
On 5/2/12 9:35 PM, Sebastian G. <bastik.tor> wrote:
> Do similar names actually mean that bridges are located where the relay
> is? (Apparently you've got the data to see these correlations)
A fine question.
How do we define "similar" and "located where the relay is?" I can see
how a relay "bastik1" and a bridge "bastik2" have similar nicknames, but
would we also teach a program that "bastikrelay" and "bastikbridge" are
similar? And are two IP address in the same, say, /30 located nearby,
or is the same /28 or even /24 okay, too?
So, while we have the data to see these correlations, I think that
whatever similarity algorithm we come up with, somebody else might come
up with something smarter. If we do the analysis you suggest and learn
that it's safe to include nicknames, that doesn't say very much. Only
because we have the data to confirm how well our attack would works
doesn't automatically mean we're in a good position to design the attack.
If you want to run this analysis with the 2008 tarball (assuming there
won't be general objections within the next two weeks), I'm happy to
take your list of likely bridge IP addresses and tell you how accurate
your algorithm is.
> "We don't need it, so better remove it." I really like that.
I think we're really conservative with giving out bridge data, and
At the same time there's a value in giving out information about
bridges, so that "remove everything" is not a good answer. For example,
I think if we give bridge operators better feedback how their bridge is
doing, we'll suddenly have a lot more bridges. Making it easy for
bridge operators to use Atlas would be a good step into that direction.
The same applies to funders who realize from our statistics how
successful the Tor Cloud project is and who then want to fund it more to
make it more usable, support more cloud providers, etc.
>> And are we giving away anything else with the nicknames?
> Maybe it's location ;)
> As I read "hints on the location" for the first time; I though it would
> mean that "TowerBridge" or "BridgeofLondon" would be bad since it could
> hint to London.
Well, in that case you'd learn that there's a (Tor) bridge in London.
But that wouldn't help you very much, would it?
> Could it make sense to ask the same question on the tor-relay list? Here
> you (the Tor people) have more data again and know who subscribed to
> both lists. I for myself assume that relay and bridge operators, which
> could object, because it's their naming scheme that could reveal
> something, are more likely to be subscribed to tor-relays.
Good idea. I added tor-relays to the Cc to let relay/bridge operators
know. Let's keep this discussion on tor-dev though.
>> And if nobody
>> screams, I'll provide the remaining tarballs containing original
>> nicknames another two weeks later.
> Probably two weeks later, since unpacking, processing and re-packing
> takes some time :) I know the sanitized ones are large when they are
> unpacked. Windows needs some time to delete the extracted files.
Right. :) I'll probably start sanitizing all bridge descriptors at once
in two weeks, starting with the 2008 ones, and provide only the 2008
tarball then. It's going to keep my CPU and disks busy for a while.
Thanks for your input!
More information about the tor-dev