Re: [tor-dev] Feedback on obfuscating hidden-service statistics

25 Nov 2014

      "A. Johnson" <aaron.m.johnson@nrl.navy.mil> writes:
...
...
Roger's branch was a PoC that wrote stats on the log file. I don't
think we have newer data than what is in #13192. It's unclear whether
the relays stopped collecting statistics, or they just haven't updated
the trac ticket.
If we could check on that and get that data, that would be really helpful. Then we could do analysis in parallel with the better extra-info implementation.
I asked Moritz but he told me that he stopped collecting those stats...
...
...
Also, Roger's stats were counting cells from both RP and IP
circuits. It's unclear whether we will take the same approach; atm I
find it more reasonable to only count RP cells/circuits.
IP stats are also interesting, but I agree less so than RP stats alone.
Yes.

Also, IP stats can be linked to specific HSes, RP stats shouldn't be
linkable to specific HSes.
...
...
BTW, did Roger do the "How many HSes are there?" HSDir stats? Is there
a ticket for that?
I am fairly sue he at least counted descriptor updates at an HSDir. I
have a slide bullet saying "We estimate about 30 to 50K hidden
services are updating their descriptors each day” from the kickoff
meeting, and I recall Roger talking about that. Because the question
is what the “dark matter” of Hidden Services consists of, that is, the
30-50K HSes less the ~1500 that are publicly available and were
responding at that time.
Yep, found it at #13195.
...
...
In any case, Roger told us that answering the questions "Approx. how
many HSes are there?" and "How much bw is HS bw?" are the important
parts of what we need to have by January. Our plan was to have that,
plus a document with various future statistics we might or might not
do. Do you think that's not sufficient?
I’m not sure about “not sufficient”, but as I said, Roger already reported estimates for those last time. But I’d go with his opinion on this - it is Tor’s part of the project.
...
Security review is indeed a big part. I'm not persuaded that just
collecting all kinds of statistics from the Tor network is always good
or helpful [0]. I personally prefer to do this methodically and with
sufficient time for thinking and feedback, instead of starting to
collect various statistics in a short time. I feel that getting
pressured about *moar statistics* is a slippery slope that leads to badness.
OK, makes sense. So let’s start tackling the hard question: what exactly do hidden services want to protect? Some questions:
To all the questions below, my answer would probably be:
"Yes, to the extend possible".

Even though those properties are not really related to hiding the
location of the HS, I believe that the fewer info the adversary knows
about a specific HS the harder it is to deanonymize it.

This philosophy can be seen in various places of the HS spec, for
example by the fact that ephemeral keys are used for introduction
points so that they don't know which HS they are serving, or by the
fact that HS descriptors will soon become encrypted so that the HSDirs
cannot read them.
...
1. Should HSes be able to hide that they even exist at all in the
system? If so, counting the number of hidden services reduces this
somewhat (up to the added noise/inaccuracy). And by the way, random
noise doesn’t necessarily hide this, because over time, if you choose
new noise every measurement period and the number of HSes is constant,
then the average will eventually reveal the exact number. Ideas to
handle this: reuse randomness (except now that reveals exactly when
HSes are added or removed), round to the nearest multiple of some
bucket size (although what about the one HS that puts you into the
next bucket..) Doing against an active adversary (not one who just
looks at your reported stats) is much harder, of course, because you
need to prevent HSDirs from knowing how many real descriptors they
have.
The fact that noise is not very effective here is true, but I also
acknowledge that this could be a useful stat, so we need to find the
right balance.

I'm hoping that the noise and the fact that the number of HSes is not
really constant, will be able to obfuscate the exact number of
HSes. So that if an adversary wanted to enumerate all of them in the
current network, he would be off by a hundred or so.
...
2. Should HSes be able to hide their (pseudonymous) popularity (i.e. number of users, connections per user)? If so, collecting RP cell counts already leaks averages and puts a lower bound on the max.
That's also true. Hopefully RP cell counts won't be able to reveal the
popularity of a specific HS though, which is the important part IMO.
...
3. Should client HS lookups be hidden so that nobody knows what’s being queried or how often? If so, collecting descriptors requests could reveal a very active set of clients.
Personally, I don't think we should count HSDir descriptor requests.
HSDir descriptor requests can be linked back to specific HSes, which I
think is bad.
...
These are hard questions because HSes are only designed to hide
location, but there also appears to be a strong desire to make it hard
to learn anything else about them. But there are good reasons
(e.g. designing protocols improvements, troubleshooting problems,
watching for malicious behavior) to learn *something* about HSes.
<snip>
...
[0]: Did you know that relays (and bridges) report bandwidth
    statistics every *15* minutes? I have no idea if this is a good
    idea to do, especially for relays that see very few clients.
I did know this. It does seem potentially revealing of, say, the guard
used by a hidden service because you can easily modulate the HSes
traffic in 15 minute intervals. Somebody should think about how what
statistics gathering might reveal and if that’s cool.
Hm. I made #13838 for this.

Re: [tor-dev] Feedback on obfuscating hidden-service statistics

George Kadianakis