[tor-dev] Anonymous Local Count Statistics Using PCSA - GSoC

Aaron Johnson aaron.m.johnson at nrl.navy.mil
Sat Apr 1 17:29:27 UTC 2017


Hi Samir,

It is my understanding that the Tor metrics team plans to handle this problem in a different way. IPs are kept in memory to provide statistics about users’ countries, and so they will instead just keep the country statistics directly. That is, a counter will be kept for all countries, upon the establishment of a new "OR connection” (a Tor term that I believe translates to a TLS connection) the IP address will be mapped to a country, and then that country’s counter will be incremented. As is done currently, further privacy-preserving techniques would be applied to these counters before publishing them, such as rounding, adding random noise, or removing some of the counters. These counters could even potentially be locally stored in a differentially-private way, which would make the local counters even less interesting to a possible attacker. The suitability of adding such local noise depends on on how inaccurate this would make the results.

You may wish to contact Karsten Loesing of the Tor Metrics team to verify my understanding.

Best,
Aaron

> On Apr 1, 2017, at 7:19 AM, Florian Tschorsch <tschorsch at informatik.hu-berlin.de> wrote:
> 
> 
> Hi Samir,
> 
> this sounds like an interesting summer project.
> 
> Since you are interested in using PCSA, our work on privacy-preserving statistics, which actually develops a privacy-enhanced version of PCSA, might be helpful. We also propose it as a way to collect distributed statistics.
> 
> In our HotPETs paper [1], we sketch the basic idea. In our journal paper [2], we provide additional details on the algorithm. If you have any questions, just let me know.
> 
> Cheers,
> Florian.
> 
> [1] https://petsymposium.org/2011/papers/hotpets11-final5Tschorsch.pdf
> [2] https://www.sciencedirect.com/science/article/pii/S1389128613001941
> 
> 
> 
>> On 30. Mar 2017, at 03:45, samir menon <menon.samir at gmail.com> wrote:
>> 
>> Hi there!
>> 
>> I'm Samir, a Computer Science student at Stanford University, with a
>> focus in applied cryptography and computer security. This summer, I
>> want to work (through GSoC) on computing usage statistics without
>> keeping IP addresses in memory (see tickets #7532 and #15469) [1] [2].
>> 
>> Currently, we keep sets of IP's (or hashed IP's) in memory so that we
>> can compute the number of unique client connections. This has been
>> pointed out as a pretty serious concern, because the IP's themselves
>> are sensitive info that we don't want an attacker to acquire, but the
>> statistics are relatively valuable.
>> 
>> As Nick first pointed out in #15469, we can use proven techniques to
>> compute these statistics without actually explicitly storing any IP's
>> (or IP hashes) in memory. The technique I want to use, "Probabilistic
>> Counting with Stochastic Averaging", or PCSA, is relatively
>> well-studied, and can provide good estimates (<5% error) of the number
>> of unique elements in a time series.
>> 
>> The basic idea is to count the number of 0's before the least
>> significant 1 in every (Jenkins hashed) IP, and then recognize that
>> the more unique IP's we encounter, the more likely it is that we see a
>> hashed IP with a large number of 0's before the least significant 1.
>> (Shoutout to Jaskaran and [3] for helping me understand this). A more
>> detailed explanation and more resources for understanding PCSA are in
>> the proposal.
>> 
>> Here is my draft proposal (also attached, but links don't work):
>> http://stanford.edu/~samir2/TorGSoCApplication.html
>> 
>> I'd love to hear feedback on it - what's feasible, what's most useful,
>> and what I should focus on, etc. You can also chat with me about it on
>> IRC at `samir2`!
>> 
>> Thanks,
>> ~Samir Menon
>> menon.samir at gmail.com
>> Stanford University, B.S. Computer Science, 2019
>> 
>> [1] https://trac.torproject.org/projects/tor/ticket/7532
>> [2] https://trac.torproject.org/projects/tor/ticket/15469
>> [3] https://www.cs.princeton.edu/~rs/talks/AC11-Cardinality.pdf
>> <TorGSoCAnonymousLocalStats.pdf>_______________________________________________
>> tor-dev mailing list
>> tor-dev at lists.torproject.org
>> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
> 
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
> 



More information about the tor-dev mailing list