[tor-dev] Proposal xyz : Count Unique IP addresses in an anonymous way

Andreas Krey a.krey at gmx.de
Tue Mar 21 14:48:19 UTC 2017


On Tue, 21 Mar 2017 16:06:59 +0000, Jaskaran Singh wrote:
...
> > On the other hand side you can indeed keep the filter rather small
> > because one bridge doesn't get that many collisions, and you don't
> > need to make it anywhere as big as to avoid collision with 2^32 entries.
> > Could also be dynamically sized depending on the number of clients seen
> > - you need aging anyway, so the next table can have a different size.
> > 
> I feel that this isn't a nice solution. Suppose you have 10 cells and 3
> hash functions at the beginning. Later when inputs exceed a threshold,
> you increase the size of bloom filter to make it to 20 cells.

10 is way too low for any population (if 'cell' means 'bit');
I played with a bit of code for that.

> 3 hash functions would map to the whole range which means the inputs
> that were mapped to 10 cells would now map to something completely
> different. Hence, error rate would be, I guess exactly 100%.

No, basically you need to retain the old bloom filter while seeding the
new one for the amount of time of your entry timeout.

(And given the other discussion regarding brute-forcing the 2^32 space,
either you need a really time-consuming hash, or you count on the
pre-seeding with random IP addresses as the only means to cover the
real ones.)

I wonder what would happen if you implement the aging by just
randomly clearing bits in the bloom filter at an appropriate rate.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800


More information about the tor-dev mailing list