[tor-dev] A suggestion for differential private statistics gathering

Eyal Ronen eyal.ronen at weizmann.ac.il
Thu Jan 11 13:45:09 UTC 2018


Hi Tim,

First of all thanks for your detailed and quick response. I am sorry for my delayed response as I am attending Real World Crypto (BTW if there are any devs here it will be nice to try and meet).  I will try to answer your questions and remarks below.

>Hi Eyal,
>
>Thank you for letting us know about your research.
>
>You've emailed an internal discussion list, so I'm going to direct all
>responses to the public tor-dev@ list. That is the list where we discuss
>changes to Tor.
>
>Please see my response below:
>
>> On 7 Jan 2018, at 20:31, Eyal Ronen <eyal.ronen at weizmann.ac.il> wrote:
>>
>> I am a PHD student, and have just published online a paper, that shows a protocol that I think might be relevant to the TOR network.
>> The protocol allows a server to privately learn information from a client, and is resilient to a situation where a malicious adversary wants to temper with the statistics.
>> The main use case in our paper is learning popular passwords,
>
>Are you aware of this password list?
>(It has no privacy apart from hashing the passwords, because the
>passwords are already publicly available in data breaches.)
>
> https://www.troyhunt.com/introducing-306-million-freely-downloadable-<https://www.troyhunt.com/introducing-306-million-freely-downloadable-pwned-passwords/> pwned<https://www.troyhunt.com/introducing-306-million-freely-downloadable-pwned-passwords/> -passwords/<https://www.troyhunt.com/introducing-306-million-freely-downloadable-pwned-passwords/>

One of the assumptions of our paper is that if we blacklist known vulnerable passwords, we will change the passwords distribution and new passwords will become "popular". We enable the server to continually learn the changing password distribution. Another example is that the previously popular password "superman" might be replaced today with the password "wonderwoman", and we want to be able to learn this kind of changes.

>
>> but we believe that it might also be usable for other cases in the TOR network. As we do not know the needs and challenges in the  TOR network, we would greatly appreciate any feedback from the TOR metrics community.
>
>I would be interested in an explanation of the tradeoffs between the
>16 and 32 bit versions of the protocol. In particular, we may not be
>able to provide servers with 4 GB of RAM for the 32 bit protocol.
>
>What would we lose with a 16 bit version? Is a 24 bit version possible?


Any bit number is possible , and without any impact on security. The number of bit is the size of the domain or histogram of items you want to measure. If you use a low number of bits, you can have collision between different measured  values, that may change the statistics. For example, lets assume we want to find out popular sites in the TOR network. We encode the sites IP. If you use 32 bit version, you can measure each IP individually. Otherwise you use a truncated a hash of the IP address. Now when you find a heavy hitter, you know what possible IP contributed to this measurement. But you can't know which one or more IP addresses are the popular ones.

However I believe that this might not be a problem. One of the main differences to other solutions is that we work in the "Local model" and do no require any "Tally Reporters" to protect the data or for answers to be aggregate on the relay servers. The only place that needs to store the whole histogram is the one server that aggregates all of the information.  I think that it greatly reduce the communication and implementation overheads of the solution.

I think that the main differences are the one above  and our provable malicious resilient. There are other differences but I think that this is the most relevant one.

If we don't care about malicious users or relays we can use the semi honest version, that takes have almost zero computation requirements, and can be done either by the user, or by the relay. If we want to use the malicious resilient, it is a bit more costly, and we need to think of the right way and place to incorporate it.

If it sounds interesting I will like to find a possible use case that I can try and design the best tailored solution for you. And we can try to prove the security and complexity of the solution, and verify this with safety board.

If needed I have a POC written in Python to be easily understandable, but I can easily implement it using TOR's cryptographic package of choice.

I will also try to go over the current proposal on the other papers in more details.

Thanks again,
Eyal Ronen

>
>> The paper is titles "How to (not) share a password: Privacy preserving protocols for finding heavy hitters with adversarial behavior" and can be found at https://eprint.iacr.org/2018/003
>>
>> The abstract is:
>> "Bad choices of passwords were and are a pervasive problem. Most password alternatives (such as two-factor authentication) may increase cost and arguably hurt the usability of the system. This is of special significance for low cost IoT devices.
>>
>> Users choosing weak passwords do not only compromise themselves, but the whole eco system. For example, common and default passwords in IoT devices were exploited by hackers to create botnets and mount severe attacks on large Internet services, such as the Mirai botnet DDoS attack.
>>
>> We present a method to help protect the Internet from such large scale attacks. We enable a server to identify popular passwords (heavy hitters), and publish a list of over-popular passwords that must be avoided. This filter ensures that no single password can be used to comprise a large percentage of the users. The list is dynamic and can be changed as new users are added or when current users change their passwords. We apply maliciously secure two-party computation and differential privacy to protect the users' password privacy. Our solution does not require extra hardware or cost, and is transparent to the user.
>>
>> The construction is secure even against a malicious coalition of devices which tries to manipulate the protocol in order to hide the popularity of some password that the attacker is exploiting. We show a proof-of-concept implementation and analyze its performance.
>>
>> Our construction can also be used in any setting where one would desire to privately learn heavy hitters in the presence of an active malicious adversary. For example, learning the most popular sites accessed by the TOR network."
>
>Tor Proposals
>
>There is a current proposal to add privacy-preserving statistics to Tor
>using a scheme that's based on PrivCount:
>
> https://gitweb.torproject.org/torspec.git/tree/proposals/288-<https://gitweb.torproject.org/torspec.git/tree/proposals/288-privcount-with-shamir.txt> privcount<https://gitweb.torproject.org/torspec.git/tree/proposals/288-privcount-with-shamir.txt> -with-shamir.txt<https://gitweb.torproject.org/torspec.git/tree/proposals/288-privcount-with-shamir.txt>
>
>PrivCount is limited to counters with integer increments, so we can't
>easily find the most popular site (the mode), or calculate the median.
>
>We discussed the proposal on tor-dev@, and decided that we could add
>more sophisticated statistics later. We would appreciate your feedback
>on the proposal. In particular, please let us know if there is anything
>we should change in the proposal to make it easier to extend with schemes
>like yours in future.
>
>Here is the proposal thread:
>
> https://lists.torproject.org/<https://lists.torproject.org/pipermail/tor-dev/2017-December/012699.html>pipermail<https://lists.torproject.org/pipermail/tor-dev/2017-December/012699.html>/tor-dev/2017-December/012699.html<https://lists.torproject.org/pipermail/tor-dev/2017-December/012699.html>
>
>
>Anyone is welcome to submit Tor proposals, the process is described here:
>
> https://gitweb.torproject.org/torspec.git/tree/proposals/001-process.txt
>
>
>Related Research
>
>There was a paper at CCS 2017 that sounds very similar to your scheme:
>
>Ellis Fenske, Akshaya Mani, Aaron Johnson, and Micah Sherr. Distributed
>Measurement with Private Set-Union Cardinality. In ACM Conference on
>Computer and Communications Security (CCS), November 2017.
>
>Source: http://safecounting.com/
>
>Can you explain how your scheme differs from PSC?
>
>
>There's a forthcoming paper at NDSS 2018 that measures a single onion
>site's popularity with differential privacy using PrivCount:
>
>Inside Job: Applying Traffic Analysis to Measure Tor from Within
>25th Symposium on Network and Distributed System Security (NDSS 2018)
>Rob Jansen, Marc Juarez, Rafael Galvez, Tariq Elahi, and Claudia Diaz
>
>Source: https://onionpop.github.io/
>
>But the measurement was limited to one site, because PrivCount can't
>calculate the modal value from  set of samples. So I can see the
>advantages of a scheme like yours over PrivCount.
>
>
>Your Own Research
>
>And finally, if you do want to do research on the public Tor network, I
>would recommend contacting the Tor Research Safety Board for advice:
>
> https://research.torproject.org/safetyboard.html
>
>T
>
>--
>Tim Wilson-Brown (teor)

Best regards,

Eyal




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20180111/d74ed4fb/attachment-0001.html>


More information about the tor-dev mailing list