Sanitizing IPv6 addresses in bridge descriptors

10 Jan 2012

      Hi Linus, Ian, list,

now that we have bridges running on IPv6 addresses, some bridge
operators enabled that feature on their public bridges and published
descriptors to the bridge authority.

I wonder how to sanitize these addresses for metrics data.  (Currently,
lines containing IPv6 addresses are simply discarded in the sanitized
output.)

Here's how we sanitize IPv4 addresses (from
https://metrics.torproject.org/formats.html#bridgedesc):
...
Replace IP address with IP address hash: Of course, the IP address
needs to be removed, too. It is replaced with 10.x.x.x with x.x.x
being the 3 byte output of H(IP address | bridge identity |
secret)[:3]. The input IP address is the 4-byte long binary
representation of the bridge's current IP address. The bridge
identity is the 20-byte long binary representation of the bridge's
long-term identity fingerprint. The secret is a 31-byte long secure
random string that changes once per month for all descriptors and
statuses published in that month. H() is SHA-256. The [:3] operator
means that we pick the 3 most significant bytes of the result.
The idea is that it should be hard to derive the original IPv6 address
from the sanitized address.  At the same time it should be easy to
notice whether the address of a given bridge has changed within the same
month.

Here's my plan for IPv6 addresses:

- Shorter secret: For the hash function input, use the 16 byte long
binary representation of the bridge's current IP address, the 20 bytes
of the fingerprint, but only a 19 byte long secure random string that
changes once per month.  The idea is to keep the input to one SHA block
(447 bits) as suggested by Ian on January 2, 2011 on this list: (16 + 20
+ 19) * 8 = 440.

- Alternative to shorter secret: Use the same 31 byte long secret and
live with the fact that the hash input now spans two SHA blocks.  Maybe
use a 75 byte long secret to have an input of two SHA blocks.

- Write 3 bytes of the sanitized IPv6 address in [::] notation.  We're
writing sanitized IPv4 addresses as 10.x.x.x.  Is there a counterpart
for IPv6 addresses?  It should be obvious that these are "private"
addresses, but I'd like to keep the notation unchanged to keep parsing
tools simple.

- Alternative to using 3 bytes: Should we use fewer or more bytes from
the SHA-256 output for IPv6 addresses?

Thanks,
Karsten

Karsten Loesing

Linus Nordberg

Alex Le Heux

Linus Nordberg

Karsten Loesing

Karsten Loesing

tags

participants (3)