Safely collecting data to estimate the number of Tor users

Björn Scheuermann scheuermann at
Sun Aug 29 17:51:18 UTC 2010


On So, 2010-08-29 at 10:30 -0700, Robert Ransom wrote:
> I'm writing up a cryptosystem that will help with this right now; it
> allows untrusted parties to merge encrypted Bloom-filter-like objects,
> but one trusted party will be required to decrypt the merged encrypted
> blob and compute the estimated number of distinct IP addresses.  The
> kind of homomorphic encryption we need is feasible, with about 160
> ciphertext bits per plaintext-Bloom-filter bit.

that sounds cool, I'd love to read more about that (I see potential
applications for something like that in other projects I'm working on,
too). You don't happen to have a paper on that?

> > There is one potential issue/limitation, though, in what you suggest. If
> > sketches are collected over N-minute intervals, and are conveyed to
> > Karsten without any indication of timing and order, then all Karsten can
> > do is to calculate the total number of distinct user IPs over the whole
> > period during which all these sketches have been generated. If this is
> > what is desired, it will be important a) that all operators collect
> > information during the same time interval, and only during that time
> > interval, and b) that this time interval is not too long, otherwise we
> > will end up with the number of distinct IP addresses during a whole week
> > or so, which might not be very helpful (due to dynamic IPs etc.).
> > Furthermore, if all sketches from this time period are combined by
> > Karsten anyway, then we might consider to let each operator OR all his
> > sketches locally and to send only the result to Karsten, so as to reveal
> > even less information to him.
> You're right about the limitation; with my approach in the message
> you're replying to, it would be best to include a sanitized time in the
> encrypted blob.  With what I'm working on, we're stuck with (a) in your
> paragraph here (but *everyone* can participate in collecting data).

If properly co-ordinated, this seems feasible. It's just something
Karsten should probably keep an eye on, if this is the direction he

Best regards


Jun.-Prof. Dr. Björn Scheuermann
Mobile and Decentralized Networks Group
Heinrich Heine University
Universitätsstr. 1, D-40225 Düsseldorf, Germany

Building 25.12, Room 02.42
Tel: +49 211 81 11692
Fax: +49 211 81 11638 
scheuermann at

More information about the tor-dev mailing list