[tor-dev] Proposal 280: Privacy-Preseving Statistics with Privcount in Tor

Mon Aug 7 17:50:37 UTC 2017

[reposting this message with permission.  It is a reply that I sent to
Aaron, where I quoted an email from him about this proposal. Tim and
Aaron had additional responses, which I'll let them quote here or not
as they think best.]

On Sat, Aug 5, 2017 at 1:38 PM, Aaron Johnson
<aaron.m.johnson at nrl.navy.mil> wrote:
 [...]
> - There are a couple of documents in PrivCount that are missing: the deployment document and the configuration document. These set up things like the identities/public keys of the parties, the planned time of the measurements, the statistics to be computed, the noise levels to use. They were required to be agreed on by all parties. These values must be agreed upon by all parties (in some cases, such as disagreement about noise, the security/privacy guarantees could otherwise fail). How do you plan to replace these?

So, I hadn't planned to remove these documents, so much as to leave
them out of scope for this proposal.  Right now, in the code, there's
no actual way to configure any of these things.

Thinking aloud:

I think we should engineer that piece by piece.  We already have the
consensus directory system as a way to communicate information that
needs to be securely updated, and where everybody needs to update at
once, so I'd like to reuse that to the extent that it's appropriate.

For some parts of it, I think we can use versions and named sets.  For
other parts, we want to be flexible, so that we can rotate keys
frequently, react to tally reporters going offline, and so on.  There
may need to be more than one distribution mechanism for this metainfo.

These decisions will also be application-dependent: I've been thinking
mainly of "always-on" applications, like network metrics, performance
measurement, anomaly-detection [*], and so on.  But I am probably
under-engineering for
"time-limited" applications like short-term research experiments.

> - I believe that instead of dealing with Tally Reporter (TR) failures using multiple subsets, you could instead simply use (t,n) secret sharing, which would survive any t-1 failures (but also allow any subset of size t to determine the individual DC counts). The DC would create one blinding value B and then use Shamir secret sharing to send a share of B to each TR. To aggregate, each TR would first add together its shares, which would yield a share of the sum of the blinding values from all DCs. Then the TRs could simply reconstruct that sum publicly, which, when subtracted from the public, blinded, noisy, counts would reveal the final noisy sum. This would be more efficient than having each TR publish multiple potential inputs to different subsets of TRs.

So, I might have misunderstood the purpose here : I thought that the
instances were to handle misbehaving DCs as well as malfunctioning
TRs.

> - Storing at the DC the blinded values encrypted to the TRs seems to violate forward privacy in that if during the measurement the adversary compromises a DC and then later (even after the final release) compromises the key of a TR, the adversary could determine the state of the DC’s counter at the time of compromise. The also applies to the optimization in Sec. 6 where the blinding values where a shared secret is hashed to produce the blinding values.

Well, the adversary would need to compromise the key of _every_ TR in
at least one instance, or they couldn't recover the actual counters.

I guess we could, as in the original design (IIUC), send the encrypted
blinding values (or public DH key in sec 6) immediately from the DC
when it generates them, and then throw them away client-side.  Now the
adversary would need to break into all the TRs while they were holding
these encrypted blinding values.

Or, almost equivalently, I think we could make the TR public
encryption keys only get used for one round. That's good practice in
general, and it's a direction I generally like.

And of course, DCs should use a forward-secure TLS for talking to the
TRs, so that an eavesdropper doesn't learn anything.

[*] One anomaly detection mechanism I've been thinking of is to look
at different "protocol-warn" log messages.  These log messages
indicate that some third party is not complying with the protocol.
They're usually logged at info, since there's nothing an operator can
do about them, but it would be good for us to get notification if some
of them spike all of a sudden.