Hi Linus, Ian, list,
now that we have bridges running on IPv6 addresses, some bridge operators enabled that feature on their public bridges and published descriptors to the bridge authority.
I wonder how to sanitize these addresses for metrics data. (Currently, lines containing IPv6 addresses are simply discarded in the sanitized output.)
Here's how we sanitize IPv4 addresses (from https://metrics.torproject.org/formats.html#bridgedesc):
Replace IP address with IP address hash: Of course, the IP address needs to be removed, too. It is replaced with 10.x.x.x with x.x.x being the 3 byte output of H(IP address | bridge identity | secret)[:3]. The input IP address is the 4-byte long binary representation of the bridge's current IP address. The bridge identity is the 20-byte long binary representation of the bridge's long-term identity fingerprint. The secret is a 31-byte long secure random string that changes once per month for all descriptors and statuses published in that month. H() is SHA-256. The [:3] operator means that we pick the 3 most significant bytes of the result.
The idea is that it should be hard to derive the original IPv6 address from the sanitized address. At the same time it should be easy to notice whether the address of a given bridge has changed within the same month.
Here's my plan for IPv6 addresses:
- Shorter secret: For the hash function input, use the 16 byte long binary representation of the bridge's current IP address, the 20 bytes of the fingerprint, but only a 19 byte long secure random string that changes once per month. The idea is to keep the input to one SHA block (447 bits) as suggested by Ian on January 2, 2011 on this list: (16 + 20 + 19) * 8 = 440.
- Alternative to shorter secret: Use the same 31 byte long secret and live with the fact that the hash input now spans two SHA blocks. Maybe use a 75 byte long secret to have an input of two SHA blocks.
- Write 3 bytes of the sanitized IPv6 address in [::] notation. We're writing sanitized IPv4 addresses as 10.x.x.x. Is there a counterpart for IPv6 addresses? It should be obvious that these are "private" addresses, but I'd like to keep the notation unchanged to keep parsing tools simple.
- Alternative to using 3 bytes: Should we use fewer or more bytes from the SHA-256 output for IPv6 addresses?
Thanks, Karsten
Karsten Loesing karsten.loesing@gmx.net wrote Tue, 10 Jan 2012 14:45:03 +0100:
| - Write 3 bytes of the sanitized IPv6 address in [::] notation. We're | writing sanitized IPv4 addresses as 10.x.x.x. Is there a counterpart | for IPv6 addresses? It should be obvious that these are "private" | addresses, but I'd like to keep the notation unchanged to keep parsing | tools simple.
RFC 3849 defines the prefix 2001:DB8::/32 as being reserved for documentation. That should be fine for this.
Hi,
On Jan 10, 2012, at 22:36, Linus Nordberg wrote:
Karsten Loesing karsten.loesing@gmx.net wrote Tue, 10 Jan 2012 14:45:03 +0100:
| - Write 3 bytes of the sanitized IPv6 address in [::] notation. We're | writing sanitized IPv4 addresses as 10.x.x.x. Is there a counterpart | for IPv6 addresses? It should be obvious that these are "private" | addresses, but I'd like to keep the notation unchanged to keep parsing | tools simple.
RFC 3849 defines the prefix 2001:DB8::/32 as being reserved for documentation. That should be fine for this.
The documentation prefix is for just that, use in documentation :)
ULA (RFC4193) is actually closer to the 10/8 (RFC1918) addresses that you use for IPv4.
Alex
Alex Le Heux alexlh@funk.org wrote Wed, 11 Jan 2012 09:57:00 +0100:
| > RFC 3849 defines the prefix 2001:DB8::/32 as being reserved for | > documentation. That should be fine for this. | | The documentation prefix is for just that, use in documentation :) | | ULA (RFC4193) is actually closer to the 10/8 (RFC1918) addresses that you use for IPv4.
Oh, right. *blush*
Thanks.
On 1/11/12 10:34 AM, Linus Nordberg wrote:
Alex Le Heux alexlh@funk.org wrote Wed, 11 Jan 2012 09:57:00 +0100:
| > RFC 3849 defines the prefix 2001:DB8::/32 as being reserved for | > documentation. That should be fine for this. | | The documentation prefix is for just that, use in documentation :) | | ULA (RFC4193) is actually closer to the 10/8 (RFC1918) addresses that you use for IPv4.
Oh, right. *blush*
So, just to get that right: how would we apply RFC4193 here?
- We start with FC00::/7 as the prefix for Local IPv6 unicast addresses.
- We set the 8th bit, the L bit, to 1, because we're generating the subsequent Global ID locally.
- We generate a random 40-bit Global ID for "Tor sanitized bridge IPv6 addresses." We don't change it, ever.
- We set the 16-bit Subnet ID to all zeros.
- We use the least significant 24 bits of the 64-bit Interface ID for the actual sanitized bridge address that was formerly encoded in 10.x.x.x.
As an example, a sanitized IPv6 bridge address would be:
[fc01:0123:4567:89ab::fedc:ba98:7654]
Does that make sense?
As for using a 19-byte or 75-byte long secret key for the SHA-256 input (see my original mail in this thread), I think I'll go with 19 bytes.
Whoever wants to break the secret needs to brute-force these 19 bytes using a known input IPv6 address and known sanitized address from us (which can easily be acquired by running your own bridge). The brute-forcing will take them a while, and it'll only tell them the secret key for one month. And once they have it they still need to brute-force a 16-byte input IPv6 address that matches a given 3-byte sanitized address. They'll need to repeat the last step for every bridge address there is.
There are vastly easier ways to learn bridge addresses. Heck, we run a service that tells you those. Something tells me I'm overthinking this. ;)
Best, Karsten
On 1/16/12 8:46 AM, Karsten Loesing wrote:
On 1/11/12 10:34 AM, Linus Nordberg wrote:
Alex Le Heux alexlh@funk.org wrote Wed, 11 Jan 2012 09:57:00 +0100:
| > RFC 3849 defines the prefix 2001:DB8::/32 as being reserved for | > documentation. That should be fine for this. | | The documentation prefix is for just that, use in documentation :) | | ULA (RFC4193) is actually closer to the 10/8 (RFC1918) addresses that you use for IPv4.
Oh, right. *blush*
So, just to get that right: how would we apply RFC4193 here?
We start with FC00::/7 as the prefix for Local IPv6 unicast addresses.
We set the 8th bit, the L bit, to 1, because we're generating the
subsequent Global ID locally.
- We generate a random 40-bit Global ID for "Tor sanitized bridge IPv6
addresses." We don't change it, ever.
We set the 16-bit Subnet ID to all zeros.
We use the least significant 24 bits of the 64-bit Interface ID for
the actual sanitized bridge address that was formerly encoded in 10.x.x.x.
As an example, a sanitized IPv6 bridge address would be:
[fc01:0123:4567:89ab::fedc:ba98:7654]
Err... What I meant was something like this:
[fd9f:2e19:3bcf::f8:2444]
Does that make sense?
The approach discussed above is now implemented:
https://gitweb.torproject.org/metrics-db.git/commitdiff/70a3d998
Unless somebody shouts at me within the next 48 hours and tells me the approach is stupid, I'm going to deploy it.
Best, Karsten