[tor-dev] Two protocols to measure relay-sensitive hidden-service statistics

Nick Mathewson nickm at alum.mit.edu
Tue Jan 6 18:42:15 UTC 2015

On Tue, Jan 6, 2015 at 12:14 PM, A. Johnson
<aaron.m.johnson at nrl.navy.mil> wrote:
> Hello tor-dev,
> While helping design ways to publish statistics about hidden services in a privacy-preserving
> manner, it has become clear to me that certain statistics cannot be safely reported using the
> current  method of having each relay collect and report measurements. I am going to describe a
> couple of simple protocols to handle this problem that I think should be implementable without much
> effort. I'd be happy to get feedback in particular about the security or ease-of-implementation of
> these protocols.
> Two HS statistics that we (i.e. people working on Sponsor R) are interested in collecting are:
>   1. The number of descriptor fetches received by a hidden-service directory (HSDir)
>   2. The number of client introduction requests at an introduction points (IPs)
> The privacy issue with #1 is that the set of HSDirs is (likely) unique to an HS, and so
> the number of descriptor fetches at its HSDirs could reveal the number of clients it had during a
> measurement period. Similarly, the privacy issue with #2 is that the set of IPs are (likely)
> unique to an HS, and so the number of client introductions at its IPs could reveal the number of
> client connections it received.
> A approach to solve this problem would be to anonymize the reported statistics. Doing so raises
> a couple of challenges, however:
>   1. Anonymous statistics should be authenticated as coming from some relay. Otherwise, statistics
>   could be polluted by any malicious actor.
>   2. Statistical inference should be made robust to outliers. Without the relay identities, it will
>   be difficult to detect and remove values that are incorrect, whether due to faulty measurement or
>   malicious action by a relay.

You know, when I got to the above paragraph, I asked myself, "Well
clearly *I'd* use blind signatures, but I wonder what Aaron is going
to suggest?"

And then I saw:

> I propose some simple cryptographic techniques to privately collect the above statistics while
> handling the above challenges.


I think that there are some details to work out, but the general
approach you describe sounds reasonable.  IMO it doesn't need to be
directory authorities who are StatsAuths, and we could use a "blinded
token once per relay per period" scheme for other stuff too down the

Median-and-quartiles or median-and-deciles sounds smarter than
just-average to me.


