Re: [tor-dev] Two protocols to measure relay-sensitive hidden-service statistics

6 Jan 2015

      On Tue, Jan 6, 2015 at 12:14 PM, A. Johnson
<aaron.m.johnson@nrl.navy.mil> wrote:
...
Hello tor-dev,
While helping design ways to publish statistics about hidden services in a privacy-preserving
manner, it has become clear to me that certain statistics cannot be safely reported using the
current  method of having each relay collect and report measurements. I am going to describe a
couple of simple protocols to handle this problem that I think should be implementable without much
effort. I'd be happy to get feedback in particular about the security or ease-of-implementation of
these protocols.
Two HS statistics that we (i.e. people working on Sponsor R) are interested in collecting are:
  1. The number of descriptor fetches received by a hidden-service directory (HSDir)
  2. The number of client introduction requests at an introduction points (IPs)
The privacy issue with #1 is that the set of HSDirs is (likely) unique to an HS, and so
the number of descriptor fetches at its HSDirs could reveal the number of clients it had during a
measurement period. Similarly, the privacy issue with #2 is that the set of IPs are (likely)
unique to an HS, and so the number of client introductions at its IPs could reveal the number of
client connections it received.
A approach to solve this problem would be to anonymize the reported statistics. Doing so raises
a couple of challenges, however:
  1. Anonymous statistics should be authenticated as coming from some relay. Otherwise, statistics
  could be polluted by any malicious actor.
  2. Statistical inference should be made robust to outliers. Without the relay identities, it will
  be difficult to detect and remove values that are incorrect, whether due to faulty measurement or
  malicious action by a relay.
You know, when I got to the above paragraph, I asked myself, "Well
clearly *I'd* use blind signatures, but I wonder what Aaron is going
to suggest?"

And then I saw:
...
I propose some simple cryptographic techniques to privately collect the above statistics while
handling the above challenges.
:)

I think that there are some details to work out, but the general
approach you describe sounds reasonable.  IMO it doesn't need to be
directory authorities who are StatsAuths, and we could use a "blinded
token once per relay per period" scheme for other stuff too down the
line.

Median-and-quartiles or median-and-deciles sounds smarter than
just-average to me.

cheers,
-- 
Nick

Re: [tor-dev] Two protocols to measure relay-sensitive hidden-service statistics

Nick Mathewson