[tor-dev] Proposal 288: Privacy-Preserving Statistics with Privcount in Tor (Shamir version)

teor teor2345 at gmail.com
Wed Dec 13 02:04:02 UTC 2017

> On 2 Dec 2017, at 07:26, Nick Mathewson <nickm at torproject.org> wrote:
> Filename: 288-privcount-with-shamir.txt Title: Privacy-Preserving Statistics
> with Privcount in Tor (Shamir version) Author: Nick Mathewson, Tim
> Wilson-Brown, Aaron Johnson Created: 1-Dec-2017 Supercedes: 280 Status: Draft
> 0. Acknowledgments
>  Tariq Elahi, George Danezis, and Ian Goldberg designed and implemented the
>  PrivEx blinding scheme. Rob Jansen and Aaron Johnson extended PrivEx's
>  differential privacy guarantees to multiple counters in PrivCount:
>  https://github.com/privcount/privcount/blob/master/README.markdown#research-background
>  Rob Jansen and Tim Wilson-Brown wrote the majority of the experimental
>  PrivCount code, based on the PrivEx secret-sharing variant. This
>  implementation includes contributions from the PrivEx authors, and others:
>  https://github.com/privcount/privcount/blob/master/CONTRIBUTORS.markdown
>  This research was supported in part by NSF grants CNS-1111539, CNS-1314637,
>  CNS-1526306, CNS-1619454, and CNS-1640548.
>  The use of a Shamir secret-sharing-based approach is due to a suggestion by
>  Aaron Johnson (iirc); Carolin Zöbelein did some helpful analysis here.
>  Aaron Johnson and Tim Wilson-Brown made improvements to the draft proposal.
> 1. Introduction and scope
>  PrivCount is a privacy-preserving way to collect aggregate statistics about
>  the Tor network without exposing the statistics from any single Tor relay.
>  This document describes the behavior of the in-Tor portion of the PrivCount
>  system.  It DOES NOT describe the counter configurations, or any other
>  parts of the system. (These will be covered in separate proposals.)
> 2. PrivCount overview
>  Here follows an oversimplified summary of PrivCount, with enough
>  information to explain the Tor side of things.  The actual operation of the
>  non-Tor components is trickier than described below.
>  In PrivCount, a Data Collector (DC, in this case a Tor relay) shares
>  numeric data with N different Tally Reporters (TRs). (A Tally Reporter
>  performs the summing and unblinding roles of the Tally Server and Share
>  Keeper from experimental PrivCount.)
>  All N Tally Reporters together can reconstruct the original data, but no
>  (N-1)-sized subset of the Tally Reporters can learn anything about the
>  data.
>  (In reality, the Tally Reporters don't reconstruct the original data at
>  all! Instead, they will reconstruct a _sum_ of the original data across all
>  participating relays.)
>  In brief, the system works as follow:
>  To share data, for each counter value V to be shared, the Data Collector
>  first adds Gaussian noise to V in order to produce V', uses (K,N) Shamir
>  secret-sharing to generate N shares of V' (K<=N, K being the reconstruction
>  threshold), encrypts each share to a different Tally Reporter, and sends
>  each encrypted share to the Tally Reporter it is encrypted for.
>  The Tally Reporters then agree on the set S of Data Collectors that sent
>  data to all of them, and each Tally Reporter forms a share of the aggregate
>  value by decrypting the shares it received from the Data Collectors in S
>  and adding them together. The Tally Reporters then, collectively, perform
>  secret reconstruction, thereby learning the sum of all the different values
>  V'.
>  The use of Shamir secret sharing lets us survive up to N-K crashing TRs.
>  Waiting until the end to agree on a set S of surviving relays lets us
>  survive an arbitrary number of crashing DCs. In order to prevent bogus data
>  from corrupting the tally, the Tally Reporters can perform the aggregation
>  step multiple times, each time proceeding with a different subset of S and
>  taking the median of the resulting values.
>  Relay subsets should be chosen at random to avoid relays manipulating their
>  subset membership(s). If an shared random value is required, all relays
>  must submit their results, and then the next revealed shared random value
>  can be used to select relay subsets. (Tor's shared random value can be
>  calculated as soon as all commits have been revealed. So all relay results
>  must be received *before* any votes are cast in the reveal phase for that
>  shared random value.)
>  Below we describe the algorithm in more detail, and describe the data
>  format to use.
> ...

Chelsea and I had a conversation on another list about Prio.

Prio is similar to PrivCount, but would allow the Data Collectors to
prove that their responses are in range, using a zero-knowledge proof

I'm posting this here with her permission:

> From: teor <teor2345 at gmail.com>
> Date: 13 December 2017 at 09:26:45 AEDT
>> On 13 Dec 2017, at 03:02, chelsea komlo <me at chelseakomlo.com> wrote:
>> Hi All,
>> This came up in a slack channel I am a part of, and it seems relevant to
>> PrivCount work- I wanted to pass it along.
>> https://crypto.stanford.edu/prio/
>> "Prio is a privacy-preserving system for the collection of aggregate
>> statistics. Each Prio client holds a private data value (e.g., its
>> current location), and a small set of servers compute statistical
>> functions over the values of all clients (e.g., the most popular
>> location). As long as at least one server is honest, the Prio servers
>> learn nearly nothing about the clients’ private data, except what they
>> can infer from the aggregate statistics that the system computes"
> I read the paper. I'm sure I didn't understand it all.
> It's very new, but it shows promise.
> And there's no real-world (even experimental) implementation yet.
> Maybe we can architect PrivCount in Tor so it's possible to upgrade
> to a system like this in future. In particular, we seem to be using
> similar primitives to Prio, so that's useful.

Are there things we can change in the current proposal to make it easier
to upgrade to a system like Prio in future?

Perhaps it's enough to version all our data exchange formats.
I'm not sure if we should lock into one particular scheme, when
something better might come along.


Tim / teor

PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20171213/15ed219a/attachment.sig>

More information about the tor-dev mailing list