Publishing sanitized bridge pool assignments
karsten.loesing at gmx.net
Mon Jan 31 19:37:00 UTC 2011
On Tue, Jan 25, 2011 at 12:12:45PM +0100, Karsten Loesing wrote:
> Hi everyone,
> we're pondering to publish the information which distribution pool a
> bridge is assigned to. The distribution pool defines whether we're giving
> out bridges via HTTP, via email, or not at all (reserved pool). The plan
> is to remove all sensitive information from bridge pool assignments before
> making them available on https://metrics.torproject.org/data.html.
> For the long version see task 2372 and comments:
> For the summary version read on:
> We want to make sanitized bridge pool assignments available, so that we
> can answer questions like these:
> - What's the correlation between which pool the bridge is in and whether
> that bridge sees a lot of use from a given country?
> - Is bridge uptime affected by the pool assignment, because operators of
> bridges in the reserved pool decide that their bridge is not useful?
> Here's a proposed data format for bridge pool assignments:
> bridge-pool-assignment 2011-01-10 01:41:14
> b 127.0.0.1:443 abcdef0123456789abcdef0123456789abcdef01
> b 127.0.0.1:443 0123456789abcdef0123456789abcdef01234567
> s IP ring 1 (port-443 subring)
> s IP ring 1 (stable subring)
> s IP ring 1
> The timestamp in the bridge-pool-assignment line is the time when the
> assignment is written to disk (twice an hour). Lines starting with b
> contain IP address, port, and fingerprint of a bridge. For sanitizing
> purposes, we replace bridge IP addresses with 127.0.0.1 and bridge
> identities with their SHA-1 hashes. That's the same approach that we take
> for sanitizing bridge descriptors. Lines starting with s contain the
> rings or subrings that a bridge is allocated to. If a bridge is not
> assigned to any pool, it doesn't have an s line.
> While this information is useful for analysis, we need to be aware that
> these lists can be misused by a censor to learn what fraction of bridges
> is contained in which pool and what percentage of bridges of a given pool
> they can block. So far, they can only tell how many bridges there are in
> total and what fraction of these bridges they know. We'll have to decide
> if the questions we expect to answer using these data are worth it.
Here's a sample bridge pool assignment from September 2010 that is
sanitized as described above (all IP addresses set to 127.0.0.1, contained
fingerprints are SHA-1 hashes of the original fingerprints):
This sample is there, so that everyone gets a better idea of what is meant
by a bridge pool assignment. Does anyone object to publishing tarballs of
these sanitized bridge pool assignments on the metrics website, so that we
(and anyone else) can analyze them?
More information about the tor-dev