<div dir="ltr">Ah, I see - PCSA can actually keep track of unique IP's without actually revealing them. Your first link cleared it up a lot for me. PCSA is a really cool technique!<div><br></div><div>I'd love to work on this as a GSoC project. I'll write up a proposal and send it out soon.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 28, 2017 at 2:55 PM, Jaskaran Singh <span dir="ltr"><<a href="mailto:jvsg1303@gmail.com" target="_blank">jvsg1303@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Samir,<br>

<br>

Brute force does affect Bloom filter/hashed-values as you rightly<br>

mentioned, but not Probabilistic Counting by Stochastic Averaging (PCSA).<br>

<br>

PCSA works on the principle that in an input the probability of n<br>

consecutive bits having value '0' from the left side(could be right as<br>

well, but for now assume it left) is 2^(-(n+1)). Bit 'i' of the<br>

Bitmap(which is our main data structure) is set if a the number of<br>

consecutive zeros (from left) is 'i'.<br>

<br>

We keep repeating it for every input(IP address). We then end up with a<br>

Bitmap whose  most significant '1' can be computed to give us an<br>

approximate number of inputs that must have been gone into the algorithm.<br>

<br>

In simple words, if I tell you that I have seen the value 1010000 out of<br>

a total of 'x' values I examined. You could guess that I had examined a<br>

total of 2^5 values before I saw that particular value.<br>

<br>

We would tweak the algorithm to store only the significant most '1' in<br>

bitmap instead of storing '1' at every iteration. This would mean that<br>

all that adversary could get hold of is a bitmap whose just one of the<br>

bit is '1'.<br>

<br>

Example, the adversary might get a data structure that looks like:<br>

                000001000000<br>

and would have no way tell what IP addresses were used as an input.<br>

<br>

This was just the basic idea behind PCSA. The actual PCSA makes use of<br>

complicated looking formula to get the approximate number of unique IP<br>

addresses in order to keep error rate low.<br>

<br>

I hope this makes sense.<br>

<br>

For some more information and simulation, please check<br>

<br>

[0] <a href="https://research.neustar.biz/2013/04/02/sketch-of-the-" rel="noreferrer" target="_blank">https://research.neustar.biz/<wbr>2013/04/02/sketch-of-the-</a><br>

day-probabilistic-counting-<wbr>with-stochastic-averaging-<wbr>pcsa/<br>

[1] <a href="http://content.research.neustar.biz/blog/runs.html" rel="noreferrer" target="_blank">http://content.research.<wbr>neustar.biz/blog/runs.html</a><br>

<br>

Regards,<br>

Jaskaran<br>

<div><div class="h5"><br>

On Wednesday 29 March 2017 01:54 AM, samir menon wrote:<br>

> This ticket [1] was suggested as a GSoC project, but I think there might<br>

> be an issue with the security model/perceived threat.<br>

><br>

> To summarize the ticket and its child [1], basically, we currently store<br>

> all the IP's seen by a node so that we can count unique IP's. The idea<br>

> is that this is dangerous; if a node is compromised, then all of those<br>

> IP addresses can be retrieved from memory. Therefore, a variety of<br>

> mitigation methods have been proposed (most prominently, the<br>

> 'Probabilistic Counting Algorithm' from [2])<br>

><br>

> Here's my issue: what about brute force?<br>

><br>

> No matter what method we use, we will arrive at a data structure that<br>

> should be able to, given an IP address, tell us whether it is new (and<br>

> we should increment the unique counter) or old (and we should leave the<br>

> unique counter the same), with some reasonably small false positive<br>

> rate. Basically, we're supposed to use some kind of Bloom filter like<br>

> structure.<br>

><br>

> Then can't that structure then be brute-forced, offline, by an attacker?<br>

> IPv4 addresses are 32-bits (~4.3 billion of them), so an attacker could<br>

> just run whatever method we use to check membership over and over, and<br>

> then recover the set of IP's. The same happens if we hash the IP's<br>

> beforehand.<br>

><br>

> So, is this attack acceptable? The only mitigation I've seen is the one<br>

> referenced by 'Aaron' in the ticket, which is the system that git uses,<br>

> cryptolog; there, they have a random salt that changes daily. Then, an<br>

> attacker can only learn the IP's for one day. This sounds like a<br>

> reasonable compromise to me, but then the implementation becomes rather<br>

> simple; just hash the IP's with a random salt that changes daily before<br>

> putting them in the set.<br>

><br>

> IPv6 also solves this (128 bits), but there again, the solution is just<br>

> to hash the IP's before storing them - the Bloom filter/'Probabilistic<br>

> Counting Algorithm' is unnecessary.<br>

><br>

> I think I must be missing something about how the 'Probabilistic<br>

> Counting Algorithm' works - somehow, it needs to keep track of the # of<br>

> unique IP's without knowing (with a high probability) whether any 1<br>

> individual IP has been seen.<br>

><br>

> Any help/pointing out of errors in my reasoning would be useful.<br>

><br>

> Thanks,<br>

> Samir Menon<br>

</div></div>> <a href="mailto:menon.samir@gmail.com">menon.samir@gmail.com</a> <mailto:<a href="mailto:menon.samir@gmail.com">menon.samir@gmail.com</a>><br>

> <a href="mailto:samir2@stanford.edu">samir2@stanford.edu</a> <mailto:<a href="mailto:samir2@stanford.edu">samir2@stanford.edu</a>><br>

<span class="im HOEnZb">><br>

> [1] <a href="https://trac.torproject.org/projects/tor/ticket/7532" rel="noreferrer" target="_blank">https://trac.torproject.org/<wbr>projects/tor/ticket/7532</a><br>

> [2] <a href="http://www.mathcs.emory.edu/~cheung/papers/StreamDB/Probab/1985-Flajolet-Probabilistic-counting.pdf" rel="noreferrer" target="_blank">http://www.mathcs.emory.edu/~<wbr>cheung/papers/StreamDB/Probab/<wbr>1985-Flajolet-Probabilistic-<wbr>counting.pdf</a><br>

><br>

><br>

</span><span class="im HOEnZb">> ______________________________<wbr>_________________<br>

> tor-dev mailing list<br>

> <a href="mailto:tor-dev@lists.torproject.org">tor-dev@lists.torproject.org</a><br>

> <a href="https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev" rel="noreferrer" target="_blank">https://lists.torproject.org/<wbr>cgi-bin/mailman/listinfo/tor-<wbr>dev</a><br>

><br>

<br>

</span><span class="HOEnZb"><font color="#888888">--<br>

Jaskaran Veer Singh (jvsg)<br>

jvsg1303 at gmail dot com<br>

PGP 2814 3FB7 A32D 429B 092E 27F0 8AA3 C532 9E1A 6AD8<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

______________________________<wbr>_________________<br>

tor-dev mailing list<br>

<a href="mailto:tor-dev@lists.torproject.org">tor-dev@lists.torproject.org</a><br>

<a href="https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev" rel="noreferrer" target="_blank">https://lists.torproject.org/<wbr>cgi-bin/mailman/listinfo/tor-<wbr>dev</a><br>

</div></div></blockquote></div><br></div>