[tor-dev] Revisiting prop224 client authorization
s7r at sky-ip.org
Thu Nov 3 00:00:29 UTC 2016
>> On 3 Nov. 2016, at 10:37, s7r <s7r at sky-ip.org> wrote:
>> I am very happy with the torspec patch.
>> Not quoting entirely, only want to add something wrt randomizing the
>> value for fake clients based on David's and teor's comments:
>> David Goulet wrote:
>>> - I think "superencrypted" -> "super-encrypted" would be nicer as everything
>>> in the descriptor as that separation of word. Or even "client-encrypted" if
>>> we want to add extra semantic. No strong opinion apart from the "-" :).
>> I prefer super-encrypted vs. client-encrypted.
>>> - [XXX consider randomization of the value 16]
>>> If it's fixed, we basically create bucket so a client can know that there
>>> are 0-16 clients or 16-32 clients and so on.
>>> If we randomize that value and let's say it's 7 then we have bucket of 7. If
>>> that value is randomized _every_ new descriptor, we create multiple size of
>>> buckets but over time someone could deduce (maybe) the low bound of clients
>>> by observing all random values and thus assume there are 0-<low bound>.
>>> I'm uncertain here what's best but seems that in any case, bucketing is
>>> happening as we pad with fake "auth-client". So I would assume here, out of
>>> my head to be safe, that we might want _all_ services to kind of look the
>>> same thus a fixed value would make sense following that train of thought.
>>> I'm liking the rest here! We'll have to think also on some padding in the
>>> INTRODUCE1 cell to avoid leaking client auth is being used.
>> This is true, we create buckets no matter what, but I think it's better
>> if one has to watch a hidden service for a lot more time to determine
>> the probable number rather than being able to tell from the first
>> descriptor that there are 0-16 clients, 16-32 clients and so on.
>> I fully agree that randomizing _every_ new descriptor does not help and
>> probably in short time someone could deduce a possible number, but I am
>> slightly uncomfortable with a global fixed value for this. One more
>> idea, if it's not helpful we can just go ahead with a fixed value of 16.
>> I think it's better if we pick a random number between 8 and 32 fake
>> clients and remember the picked value so it will be used for every new
>> descriptor until something in our setup changes or enough time has
>> passed. In order to know when to reset it, we save it (in our state)
>> along with:
>> 1. The number of real authorized clients when the random value was picked.
>> 2. Timestamp when the random value was picked + an end of life for the
>> random value.
>> We reset the random value of fake authorized clients and also its end of
>> life when:
>> a) number of real authorized clients in torrc changes from what we have
>> in our state.
>> b) end of life for the random value is reached. End of life will be
>> timestamp + a random period between 30 and 90 days.
>> c) obvious case when Tor is re-installed and old state is lost.
>> We call this function on every HUP and (re)start. We can tune the
>> numbers 8 - 32 and period 30 - 90 days as you like.
>> This way there are a lot of buckets and significantly more time needed
>> for an observer to deduce a probable number. It is quite possible one
>> can never deduce a "probable enough" number.
>> We combine this with faking extra if needed in the encrypted portion to
>> the next multiple of 10k bytes.
>> It's true that it won't help if the hidden service operator changes the
>> number of authorized clients every hour for a long period but in
>> practice this doesn't happen - number of authorized clients changes
>> rarely. And even in this scenario it still makes things a lot more
>> Compared to other parts of prop 224, this is easy to code and should be
>> worth the effort. What do you think?
> If you want to do it this way, with noise and buckets, ask someone who is
> good at differential privacy to do the numbers for you, rather than guessing.
> You'll need to know the level of activity you want to hide.
As I said the numbers can be changed - I was illustrating an example. I
guessed some numbers that seamed reasonable to me so I could give an
example, and also because it's not a critical part. We only try to hide
the number of real authorized clients, or make it as hard as possible
for an observer to deduce a number close to the realistic number of
authorized clients, that's all.
Simply using the numbers that were guessed without deep knowledge in
differential privacy is a lot better than using a global fixed value of
16, but as I said this doesn't need to be a debate because I am not
against the fixed value, only saying it's better to randomize, if the
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 488 bytes
Desc: OpenPGP digital signature
More information about the tor-dev