[tor-bugs] #2435 [Metrics]: Preserving hashed IP addresses in sanitized bridge descriptors

Tue Jan 25 18:41:29 UTC 2011

#2435: Preserving hashed IP addresses in sanitized bridge descriptors
-------------------------+--------------------------------------------------
 Reporter:  karsten      |       Owner:  karsten
     Type:  enhancement  |      Status:  new    
 Priority:  normal       |   Milestone:         
Component:  Metrics      |     Version:         
 Keywords:               |      Parent:         
-------------------------+--------------------------------------------------
 Roger mentioned in a comment of #2372:

 > One issue that comes to mind that we might want to research is how often
 a given bridge moves IP address. The method you describe above would lose
 that info, yes? Whereas if we do a keyed hash of the IP address (and never
 disclose the key), we could distinguish "same" from "different". I
 remember we had the keyed hash design in some other sanitization context,
 but I don't remember which one -- how is the idea working out in that
 other context?
 >
 > (It's possible that we already do the keyed hash for the regular bridge
 descriptors, so we would just need to match up the sha1(fingerprint) in
 this file with the sha1(fingerprint) in that file and we could look up the
 IP address. In which case maybe there's merit in doing the same keyed hash
 in both places, to ease the job of future researchers.)

 When we discussed this topic the last time, I suggested replacing bridge
 IP addresses with something very similar to this:

 {{{
   H(IP address + bridge identity + secret)[:3]
 }}}

 The input IP address is the 4-byte long binary representation of the
 bridge's current IP address.  The bridge identity is the 20-byte long
 binary representation of the bridge's long-term identity fingerprint.  The
 secret is an arbitrary, sufficiently long (say, 20 bytes), secure random
 string that does not change over time and that is only known to the
 machine running the bridge descriptor sanitizer plus backups.  H is SHA-1.
 The [:x] operator means that we pick the x most significant bytes of the
 result.

 The original transformation used 4 bytes of the output, but I changed this
 to use only 3 bytes here.  The idea is to write the resulting "IP
 addresses" as 10.x.x.x in the sanitized descriptors to make it clear that
 these are no public IP addresses.  I want to avoid confusion with the non-
 sanitized IP addresses in exit policies.  I'm aware of the higher
 collision probability, but the probability and impact of missing an IP
 address change are still sufficiently low.

 The resulting "IP address" helps us detect whether a specific bridge has
 changed its IP address.  It does not tell us if two bridges run on the
 same IP address.  It also does not tell us when a bridge changes its
 fingerprint but keeps its IP address.

 The two important pieces of this transformation are that a) someone who
 learns a bridge's identity cannot guess the bridge's previous IP addresses
 (which would have been possible without using the secret); b) someone who
 guesses the secret cannot guess the IP addresses of all bridges (which
 would have been possible without using the bridge identity).

 There are more details about preserving hashed IP addresses in
 [http://archives.seul.org/or/dev/Apr-2010/msg00000.html this thread].

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2435>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online