[tor-dev] Feedback on obfuscating hidden-service statistics
karsten at torproject.org
Thu Nov 20 12:49:35 UTC 2014
On 20/11/14 13:42, George Kadianakis wrote:
> "A. Johnson" <aaron.m.johnson at nrl.navy.mil> writes:
>>>>> George and I have been working on a small proposal to add two
>>>>> hidden-service related statistics: number of hidden services and
>>>>> total hidden-service traffic.
>>>> Great, I’m starting to focus more on this project now. Well,
>>>> actually I’m going on a trip for a week today, but *then* I’m
>>>> focusing more on this project :-)
>>> Sounds great! We're meeting every Tuesday at 16:00 UTC in #tor-dev.
>>> Feel free to drop by.
>> Excellent. I won’t be there this coming Tuesday, but I’ll be there the next Tuesday.
>>> Replicas mean that each descriptor is stored under two identifiers, so
>>> that's two places. Further, descriptor identifiers change once per
>>> day, so during a 24-hour period, there are up to four descriptor
>>> identifiers for a hidden service.
>> That makes sense. It would be nice if the statistics would allow you
>> to identify how long (i.e. how many hour periods) each descriptor was
>> observed being published. That would allow us to figure out if there
>> are lots of short-lived services or fewer long-lived
>> services. Publishing statistics every hour would pretty much take care
>> of this. If you are really set on 24 hours, then perhaps you could add
>> the total number of published descriptors in addition to the number of
>> *unique* published descriptors.
>> Also, my suggestion about using additive noise applies equally well to
>> the descriptor statistics. And multiplicative noise is a *bad idea* if
>> you don’t have some adjustment for small values (e.g. 10% noise of a 0
>> value is 0, and 10% of 1 is only 0.1).
>>> We have been thinking about many more hidden-service related
>>> statistics in a separate document. We're currently discussing whether
>>> we should turn it into a tech report, because we'll probably not want
>>> to implement most of those statistics. If you have remarks or more
>>> ideas, please feel free to edit the document. We're going to have a
>>> public review round for this, too, but that might not happen in the
>>> next week or two.
>> Great! I think we should go for at least a little more data in the
>> current proposal (what is the timeline for this, btw?). I think we
>> should come up with a list of statistics we might imagine gathering
>> and identify the subset of those that we’re comfortable gathering at
>> this point. For example, I think failure statistics is much more
>> innocuous than other data, and those would be very useful. For
>> example, they would help us understand how to improve the protocol is
>> failing, and it might help us identify misuse of hidden services
>> (e.g. by botnets clients stupidly looking for non-existent descriptors
>> or by malicious crawlers attempting to brute force descriptors). So
>> here are some ideas:
>> 1. Number of fetch requests for descriptors that don’t exist (number of fetch requests that do succeed would of course be very useful as well)
>> 2. Number of descriptor publishes to the wrong HSDir (actually I suspect that the HSDir doesn’t check this and wants to be accepting of any publish)
>> 3. Number of rendezvous circuits that never connect (from the RP perspective)
>> 4. Number of rendezvous circuits on which no data cells are ever sent
> (CC'ed [tor-dev])
Thanks, George, for moving the discussion here.
Here's the latest proposal draft where I incorporated Aaron's suggestions:
If people on this list have more feedback, please reply here. Thanks!
All the best,
> Thanks for the input Aaron!
> The timeline here is that we are hoping the proposal _and_ the
> implementation to be ready by mid-December. Then we are hoping that we
> can deploy the code to a few relays so that we have some data by January.
> So, time is tight.
> I'm currently OK with the two statistics in:
> I feel that any other statistics will need to be carefully analyzed.
> We should add the ideas you mentioned in the etherpad, and get them
> included in the tech report (which we are also hoping to have ready in
> some form by mid-January).
> The tech report is supposed to contain and analyze most of the HS
> statistics we can think of. It will likely contain many stats that we
> will never do, but also some stats that might be a good idea. The good
> ones we should eventually integrate to the Tor proposal and write code
>>> Thanks for the very valuable input! Let me know if the following
>>> draft looks okay, and I'll start another thread on tor-dev at .
>> "Lab(\epsilon/C)” -> "Lap(\epsilon/C)” (that was my mistake. I think
>> having the added noise both parameterized and included in the reported
>> statistics is an idea worth thinking about. Making it a parameter
>> allows you to easily change it without upgrading. Including it in the
>> statistics would allow us to correct better for noise if different
>> relays might be adding different amounts of noise due to inconsistent
>> opinions of the noise parameter (if this should never happen, then I
>> guess this wouldn’t be necessary).
>> So again, sorry that I’m not going to be very responsive on this for the next week. I’m really happy that you’re working on it!
More information about the tor-dev