Bridge stability

Mon Feb 15 20:05:54 UTC 2010

Hey Roger,

On Feb 15, 2010, at 8:14 PM, Roger Dingledine wrote:
> On Mon, Feb 15, 2010 at 09:29:19AM +0100, Karsten Loesing wrote:
>> I should state that there is one major inaccuracy in the analysis:
>> Bridges that change their IP address are not reachable by their
>> clients anymore. In theory, clients are able to download updated bridge
>> descriptors from the bridge authority to learn about new IP addresses,
>> but this functionality is not implemented yet. However, I cannot take
>> changing IP addresses into account for this analysis, because I removed
>> the IP addresses when sanitizing the bridge descriptors. Hah! Maybe we
>> should just fix this problem by implementing the missing functionality.
> 
> This part is worrisome. That means your analysis is assuming the bridges
> always stick to the same IP address, right? It's worth trying to do
> the analysis when we take into account that some bridges are on highly
> dynamic IPs (e.g. daily).
> 
> What's the process by which we sanitize them? It seems that a fine
> solution would be to hash the IP addresses keyed with a secret that
> remains constant across the hashes. So you could tell if the IP addresses
> are the same without being able to tell what they are. The main challenge
> there is keeping the secret somewhere secret in between batches (and
> maybe rotating the secret monthly, for some level of forward secrecy).

Yes, we can do something like that. I assume that it'll keep my server busy for a day or two to parse all the descriptors once more. But I can do that.

Instead of the secret input to the hash function, how about we concatenate bridge identity and IP address as input? Note that we don't put the bridge identity in the sanitized descriptor, but only its hash. That way we'd avoid using a secret that we'll lose or forget anyway and have something reproducible. To be precise, this is what I have in mind:

  sanitized bridge identity = H(bridge identity)

  sanitized IP address = H(bridge identity + IP address)[:4]

Note that only the first 4 bytes of the result are used, because the result is written as the bridge's IP address, covering the entire range between 0.0.0.0 and 255.255.255.255. Of course, there's a reasonable chance for collisions for a bridge identity with two different IP addresses. But I want the network status to contain all relevant information rather than re-assembling network status entries and bridge descriptors (which could contain more information in their contact line). Are there better ways to add 20 bytes to the network status? We might still add the full hash to the descriptor's contact line.

Thanks,
--Karsten