[tor-dev] Can we stop sanitizing nicknames in bridge descriptors?
Ondrej Mikle
ondrej.mikle at gmail.com
Fri May 4 23:47:45 UTC 2012
On 05/04/2012 01:04 PM, Karsten Loesing wrote:
> On 5/4/12 2:21 AM, Ondrej Mikle wrote:
>> On 05/03/2012 01:32 PM, Karsten Loesing wrote:
>>> On 5/2/12 9:35 PM, Sebastian G. <bastik.tor> wrote:
>>>> [...]
>>>> "We don't need it, so better remove it." I really like that.
>>>
>>> I think we're really conservative with giving out bridge data, and
>>> that's good.
>>>
>>> At the same time there's a value in giving out information about
>>> bridges, so that "remove everything" is not a good answer. For example,
>>> I think if we give bridge operators better feedback how their bridge is
>>> doing, we'll suddenly have a lot more bridges. Making it easy for
>>> bridge operators to use Atlas would be a good step into that direction.
>>> The same applies to funders who realize from our statistics how
>>> successful the Tor Cloud project is and who then want to fund it more to
>>> make it more usable, support more cloud providers, etc.
>>
>> I would suggest looking at homomorphic hash [1] and Shamir's discrete logarithm
>> hash function [2]. (Those also work well with linear network coding [3], but not
>> sure if it could be useful here.)
>>
>> For example, encoding FQDN, IP or nick can be done by splitting the
>> argument-to-encode by fields or characters. The parts can be then used as input
>> to the hash function.
>>
>> The function allows checking whether a nick/FQDN/IP has specific part, or two
>> have identical part, but does not disclose "plaintext" of the part.
>>
>> Obviously, there are statistical attacks possible: e.g. for FQDNs, the attacker
>> could guess which component maps to 'com', as it is the most common TLD.
>> Similarly, splitting up into characters can be attacked by using frequency
>> tables. There are other things that could apply here (thinking about attacks on
>> "plaintext RSA" without padding).
>>
>> Nevertheless, I think it's still better than publishing plaintext data if we are
>> not sure what they might give away. Implementation using gmp/gmpy/numpy should
>> be fairly easy.
>
> Interesting approach. So, the idea would be to split a nickname like
> "ec2bridgeb268f2ae6" into its characters (or pairs of 2 or more
> characters?), run it through the hash function, and then be able to
> check if the nickname starts with "ec2bridge"? Plus, the approach would
> still work if we later decide we want to find all bridges with nicknames
> starting with "rackspacebridge"?
Basically, yes (you'd need to check for "rackspacebridg", since it's length is
multiple of two; it's possible to check in middle as well as at end). Though as
you write below, the low entropy make it not so much usable for groups of 2-3
characters.
Thus the idea is probably over-engineering in this case. I just note below some
design facets I had in mind:
> My first concern is that there's not enough entropy in nicknames for the
> hash function to provide sufficient protection. I could imagine it's
> not hard to throw variants of all known relay nicknames into that hash
> function and learn 50%, if not 75%, of all used bridge nicknames.
One trick I had in mind was create "secret hash function" (take the following
with a grain of salt, algebra is not my "primary thing"):
- you keep generators g_i secret, which turns the problem from discrete-log into
a problem of finding n-th root in finite field (n is the value of the digraph,
trigraph or few characters, e.g. encoded value of 'ec2bridge', possibly
"blinded" by another multiplication with secret constant c_i)
- in general, computing n-th root is quite slow [1], but there are many special
cases like square root (quadratic residue)
- the above properties would make it slow for attacker to brute-force all
possible values - i.e. attacker can't just compute the result values of such
homomorphic hash, because he doesn't know the function parameters (e.g. without
computing the generators), but everyone can use the "homomorphic property" for
testing parts
> My second concern is that this approach would only solve the problem of
> counting EC2 bridges, but wouldn't make sites like Atlas more usable for
> bridge operators.
Yes, the scheme severely restricts what you can do with the data.
Ondrej
[1] http://www.ma.utexas.edu/users/voloch/Preprints/roots.pdf
More information about the tor-dev
mailing list