On 2 Dec 2017, at 07:26, Nick Mathewson nickm@torproject.org wrote:
Filename: 288-privcount-with-shamir.txt Title: Privacy-Preserving Statistics with Privcount in Tor (Shamir version) Author: Nick Mathewson, Tim Wilson-Brown, Aaron Johnson Created: 1-Dec-2017 Supercedes: 280 Status: Draft
- Acknowledgments
Tariq Elahi, George Danezis, and Ian Goldberg designed and implemented the PrivEx blinding scheme. Rob Jansen and Aaron Johnson extended PrivEx's differential privacy guarantees to multiple counters in PrivCount:
https://github.com/privcount/privcount/blob/master/README.markdown#research-...
Rob Jansen and Tim Wilson-Brown wrote the majority of the experimental PrivCount code, based on the PrivEx secret-sharing variant. This implementation includes contributions from the PrivEx authors, and others:
https://github.com/privcount/privcount/blob/master/CONTRIBUTORS.markdown
This research was supported in part by NSF grants CNS-1111539, CNS-1314637, CNS-1526306, CNS-1619454, and CNS-1640548.
The use of a Shamir secret-sharing-based approach is due to a suggestion by Aaron Johnson (iirc); Carolin Zöbelein did some helpful analysis here.
Aaron Johnson and Tim Wilson-Brown made improvements to the draft proposal.
- Introduction and scope
PrivCount is a privacy-preserving way to collect aggregate statistics about the Tor network without exposing the statistics from any single Tor relay.
This document describes the behavior of the in-Tor portion of the PrivCount system. It DOES NOT describe the counter configurations, or any other parts of the system. (These will be covered in separate proposals.)
- PrivCount overview
Here follows an oversimplified summary of PrivCount, with enough information to explain the Tor side of things. The actual operation of the non-Tor components is trickier than described below.
In PrivCount, a Data Collector (DC, in this case a Tor relay) shares numeric data with N different Tally Reporters (TRs). (A Tally Reporter performs the summing and unblinding roles of the Tally Server and Share Keeper from experimental PrivCount.)
All N Tally Reporters together can reconstruct the original data, but no (N-1)-sized subset of the Tally Reporters can learn anything about the data.
(In reality, the Tally Reporters don't reconstruct the original data at all! Instead, they will reconstruct a _sum_ of the original data across all participating relays.)
In brief, the system works as follow:
To share data, for each counter value V to be shared, the Data Collector first adds Gaussian noise to V in order to produce V', uses (K,N) Shamir secret-sharing to generate N shares of V' (K<=N, K being the reconstruction threshold), encrypts each share to a different Tally Reporter, and sends each encrypted share to the Tally Reporter it is encrypted for.
The Tally Reporters then agree on the set S of Data Collectors that sent data to all of them, and each Tally Reporter forms a share of the aggregate value by decrypting the shares it received from the Data Collectors in S and adding them together. The Tally Reporters then, collectively, perform secret reconstruction, thereby learning the sum of all the different values V'.
The use of Shamir secret sharing lets us survive up to N-K crashing TRs. Waiting until the end to agree on a set S of surviving relays lets us survive an arbitrary number of crashing DCs. In order to prevent bogus data from corrupting the tally, the Tally Reporters can perform the aggregation step multiple times, each time proceeding with a different subset of S and taking the median of the resulting values.
Relay subsets should be chosen at random to avoid relays manipulating their subset membership(s). If an shared random value is required, all relays must submit their results, and then the next revealed shared random value can be used to select relay subsets. (Tor's shared random value can be calculated as soon as all commits have been revealed. So all relay results must be received *before* any votes are cast in the reveal phase for that shared random value.)
Below we describe the algorithm in more detail, and describe the data format to use.
...
Chelsea and I had a conversation on another list about Prio.
Prio is similar to PrivCount, but would allow the Data Collectors to prove that their responses are in range, using a zero-knowledge proof technique.
I'm posting this here with her permission:
From: teor teor2345@gmail.com Date: 13 December 2017 at 09:26:45 AEDT
On 13 Dec 2017, at 03:02, chelsea komlo me@chelseakomlo.com wrote:
Hi All,
This came up in a slack channel I am a part of, and it seems relevant to PrivCount work- I wanted to pass it along.
https://crypto.stanford.edu/prio/
"Prio is a privacy-preserving system for the collection of aggregate statistics. Each Prio client holds a private data value (e.g., its current location), and a small set of servers compute statistical functions over the values of all clients (e.g., the most popular location). As long as at least one server is honest, the Prio servers learn nearly nothing about the clients’ private data, except what they can infer from the aggregate statistics that the system computes"
I read the paper. I'm sure I didn't understand it all. It's very new, but it shows promise. And there's no real-world (even experimental) implementation yet.
Maybe we can architect PrivCount in Tor so it's possible to upgrade to a system like this in future. In particular, we seem to be using similar primitives to Prio, so that's useful.
Are there things we can change in the current proposal to make it easier to upgrade to a system like Prio in future?
Perhaps it's enough to version all our data exchange formats. I'm not sure if we should lock into one particular scheme, when something better might come along.
T
-- Tim / teor
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n ------------------------------------------------------------------------