Safely collecting data to estimate the number of Tor users

Robert Ransom rransom.8774 at gmail.com
Sun Aug 29 17:30:18 UTC 2010


On Sun, 29 Aug 2010 19:06:13 +0200
Björn Scheuermann <scheuermann at cs.uni-duesseldorf.de> wrote:

> Variant #2a would be to craft a data structure in such a way that we can
> be absolutely sure that it is no longer sensitive. In that case, the
> operators could publish the data structure. I admit that this is most
> likely impossible (even though I believe that sketches with additional
> "noise" bits, as suggested in my second mail, come at least pretty
> close).
> 
> Variant #2b is to craft the data structure in such a way that we are
> *reasonably* sure that the data are no longer sensitive. In that case,
> the data structures should be encrypted by the operators so that only
> Karsten Loesing can decrypt them, and all traces of them should be
> eliminated once Karsten has finished the merging and evaluation process,
> just as you suggest.

I'm writing up a cryptosystem that will help with this right now; it
allows untrusted parties to merge encrypted Bloom-filter-like objects,
but one trusted party will be required to decrypt the merged encrypted
blob and compute the estimated number of distinct IP addresses.  The
kind of homomorphic encryption we need is feasible, with about 160
ciphertext bits per plaintext-Bloom-filter bit.


> There is one potential issue/limitation, though, in what you suggest. If
> sketches are collected over N-minute intervals, and are conveyed to
> Karsten without any indication of timing and order, then all Karsten can
> do is to calculate the total number of distinct user IPs over the whole
> period during which all these sketches have been generated. If this is
> what is desired, it will be important a) that all operators collect
> information during the same time interval, and only during that time
> interval, and b) that this time interval is not too long, otherwise we
> will end up with the number of distinct IP addresses during a whole week
> or so, which might not be very helpful (due to dynamic IPs etc.).
> Furthermore, if all sketches from this time period are combined by
> Karsten anyway, then we might consider to let each operator OR all his
> sketches locally and to send only the result to Karsten, so as to reveal
> even less information to him.

You're right about the limitation; with my approach in the message
you're replying to, it would be best to include a sanitized time in the
encrypted blob.  With what I'm working on, we're stuck with (a) in your
paragraph here (but *everyone* can participate in collecting data).


Robert Ransom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20100829/b17ae098/attachment.pgp>


More information about the tor-dev mailing list