Re: [tor-dev] Revisiting prop224 client authorization

19 Oct 2016

      Hello George,

Inline comments:
...
Hello again,
I read the feedback on the thread and thought some more about this. Here
are some thoughts based on received feedback. A torspec branch coming
soon if people agree with my points below.
I'd also like to introduce a new topic of discussion here:
d) Should we introduce the concept of stealth auth again?
IIUC the current prop224 client auth solutions are not providing all
   the security properties that stealth auth did. Specifically, if Alice
   is an ex-authorized-client of a hidden service and she got revoked,
   she can still fetch the descriptor of a hidden service and hence
   learn the uptime/presense of the HS. IIUC, with stealth auth this was
   not previously possible.
I also share David's feeling here, presence hiding is not so critical
and I am not sure if its worth its engineering and additional code
costs. We can add this feature any time after deployment anyway because
there are many questions and we need some stats and to analyze user
demands in order to take the right decision here. Freezing this specific
feature until further analysis shouldn't be a problem.
...
...
a) I think the most important problem here is that the authorization-key logic
   in the current prop224 is very suboptimal. Specifically, prop224 uses a
   global authorization-key to ensure that descriptors are only read by
   authorized clients. However, since that key is global, if we ever want to
   revoke a single client we need to change the keys for all clients. The
   current rend-spec.txt does not suffer from this issue, hence I adapted the
   current technique to prop224.
Please see my torspec branch `prop224_client_auth` for the proposed changes:
   https://gitweb.torproject.org/user/asn/torspec.git/log/?h=prop224_client_aut...
Some further questions here:
i) Should we fake the client-auth-desc-key blob in case client authorization
      is not enabled? Otherwise, we leak to the HSDir whether client auth is
      enabled. The drawback here is the desc size increase (by about 330 bytes).
Alternatively, we can try to put it in the encrypted part of the
      descriptor. So that we require subcredential knowledge to access the
      encrypted part, and then client_auth_cookie knowledge to get the
      encryption key to decrypt the intro points etc. I feel that this
      double-encryption design might be too annoying to implement, but perhaps
      it's worth it?
Seems like people preferred the double-encryption idea here, so that we
reveal the least amount of information possible in the plaintext part of
the desc.
I think this is a reasonable point since if we put the auth keys in the
plaintext part of the descriptor, and we always pad (or fake clients) up
to N authorized clients, it will be obvious to an HSDir if a hidden
service has more than N authorized clients (since we will need to fake
2*N clients then).
Agreed.
...
---
WRT protocol, I guess the idea here is that if client auth is enabled,
then we add some client authorization fields in the top of the encrypted
section of the descriptor, that can be used to find the client-auth
descriptor encryption key. Then we add another client-auth-encrypted
blob inside the encrypted part, which contains the intro points etc. and
is encrypted using the descriptor encryption key found above.
So the first layer is encrypted using the onion address, and the second
layer is encrypted using the client auth descriptor key. This won't be
too hard to implement, but it's also different from what's currently
coded in #17238.
Do people feel OK with this?
Yes, sounds good.
...
Also, what should happen if client auth is not used? Should we fall back
to the current descriptor format, or should we fake authorized clients
and add a fake client-auth-encrypted-blob for uniformity? Feedback is
welcome here, and I think the main issue here is engineering time and
reuse of the current code.
---
I say Yes with capital letter here. We should make as many descriptors
as we can look alike as we can. If client auth is not used, that
descriptor should contain N fake clients (given we choose a reasonable N
that will result in an reasonable average size for descriptors network
wide).
...
Now WRT security, even if we do the double-encryption thing, and we
consider an HSDir adversary that knows the onion address but is not an
authorized client,we still need to add fake clients, otherwise that
adversary will know the exact number of authorized clients. So fake
clients will probably need to be introduced anyhow.
Of course, fake clients will patch this problem in a good way - an
adversary that only knows the onion address but its not an authorized
client will not have an _exact_ number of authorized clients no matter
what. And the additional padding that comes with fake clients is helpful
in making majority of descriptors alike in terms of size, which is our
best move here the way I see it (solves 2 things).
...
As David pointed out, this all boils down to how much we pad the
encrypted part of the descriptor, otherwise we always leak info. If we
are hoping for a leakless strategy here, we should be generous with our
padding.
Agreed. A leakless solution here is to pad all descriptors to the hard
limit, but this is not worth it. The hard limit is assumed to be reached
only by HS-es that use auth and have a huge number of authorized
clients, case in which they might look into scalability solutions, like
running multiple onion hostnames for client groups linked on a single
backend service. Anyway, what I am trying to say here is that I think
these will be very isolated cases, if they will be at all, the rest of
descriptors will just look alike with a reasonable N fake clients that
don't grow the descriptors to the hard limit for everyone.
...
Let's see how much padding we need:
- Each intro point adds about 1.1k bytes to the descriptor (according to
  david).
- Each block of 16 authorized clients adds about 1k bytes to the
  descriptor (according to the format described below).
- Apart from intro points and authorized clients, the rest of the
  descriptor is not that heavy: less than 1k bytes (right?)
To get an average size here, let's consider a normal descriptor with 5
intro points and 16 authorized clients. With the above values, the
overhead on the encrypted part of the descriptor is about 7k bytes.
To get a maximum size here, let's consider a phat descriptor that
contains 20 intro points and 160 authorized clients. With the above
values, the overhead on the encrypted part of the descriptor will be 32k
bytes.
Hence, here are some suggestions (read: magic numbers):
- We always pad the encrypted section of the descriptor to the nearest
  multiple of 10k bytes (read: we pad the plaintext before we encrypt).
This should be enough to obfuscate the number of IPs and authorized
  clients on most hidden services out there.
Sounds good.
...
- If client auth is enabled, we always include a multiple of 16
  authorized clients (and fake the extra if needed) in the encrypted
  portion.
Let's randomize a bit more here in order not to give an attacker that
knows the onion address (and is a revoked client) one fixed number. To
make assumptions - randomization is always better. Let's include a
multiple of a random number between 8 and 32 fake authorized clients and
fake extra if needed in the encrypted portion, *based on the size of the
descriptor with no fake data*. Same fake clients random number will
apply for hidden services that do not use auth at all.

Here we should ensure fake authorized client slots do not eat the space
for real authorized clients in the descriptor. So the wording should be
different than "we always include a multiple of Y authorized clients" -
if a HS has real authorized clients configured that make the descriptor
size 40k bytes, we should not add any fake clients obviously. Nobody can
tell except the HS how many are fake and how many are real (referring to
attackers that know the onion address here), so what we are doing with
this is ensuring real data takes priority over padding data.

I am sure this is what you meant, just noted that it reads a little
confusing so we should rephrase for torspec.
...
- We set the maximum allowed size of descriptors on HSDirs to 40k bytes.
  This should be enough to accomodate the fat descriptor described above.
As said, I was quite generous with the max size here. even though I
doubt any actual hidden services will have such enormous descriptors,
but I guess allowing those might prove to be a good idea in the future.
I don't think 40k is that much in terms of size, especially when
compared to things like the microdesc-consensus which is like 1.4MB, and
is required for Tor to run.
The main issue with big max sizes here, are assholes using our DHT as
cloud storage. I don't think 40k is that bad in this regard, but I'm not
sure how to evaluate this properly.
+1 on the 40k hard limit.
...
...
ii) Should we use the descriptor ASCII format to encode all the
       client-auth-desc-key data? Or is that weird binary format OK?
People said this is a good idea and I agree.
Here is a suggested informal format, that gets placed in the beginning
of the encrypted section of the descriptor:
desc-auth-type <auth-type>
desc-auth-nonce <8-byte-nonce-base64>
auth-client <client-id> <iv> <encrypted-cookie>
and we always include a multiple of 16 clients.
Here is how it would look like in real life:
=======================================================================
desc-auth-type cookie
desc-auth-nonce JMk/8BTbhB4
auth-client dkW2nw OTqqSv29icTL5TSZ5TVQ3A +PIt0D9oWlDfbpGtRxGmeA
auth-client z0/MMQ dw+pwJcLk9LB/FPfxFBL3g rFX9f6WUVZVUPEwFet428Q
auth-client tH/BEQ zFWL1T9H/1fyV6bYW5Ol/Q /hjW1SgF0S3BANJhZZZ/OQ
auth-client 2lxnoQ ggm/IraIMQ+L56V3R0OyHQ gI9Lh5azwxcunYwyFXxJSg
auth-client S88yFw S4072NBKCwbwGep7/bJv+Q j3GdtDLAiZWI2jv0z6wfNw
auth-client T7KbqA zhj5vu+HghqcMBRYpsGE0Q nQQtScbK91xx1G5l5gUWYg
auth-client xPROzQ /OAH9FwXOufKGmFlBkqEJQ sqzeo6n4uMnqyghv3Vj3ZA
auth-client l7lqEQ iZrRNH1Lg636j32tg7XfLQ HXeqg6nViGb7H4T1dYMK9Q
auth-client +9ZUZw FReeAD5/mQD03J+YiffTKw oK1q7l/4JX+P08dLKYOmlw
auth-client 0L9rXg xp9hvTWcWSmLBcyLN96Msg THWHP2nLlHBWWrwECOIg+A
auth-client +kJcyQ nl7dkTOA9r10jk3Bo6I5WQ sGqMNtLMOiLDVDOr9YxJAw
auth-client sa5PQQ oGqjP0Ko72fopFw2aAm2QA f+enrvjiDSXGJ3t77vDfAQ
auth-client m87zTQ Pl5ITgw/6nb5zJPXjl9GPA X0lIhGNjXZqhGf+oHDX/wQ
auth-client t8Ki0g GOPiP3WM+FQlDXLK1vUEOg 8bBZRrlxj6Ca392exkNuog
auth-client 1D9wbQ 0Y5FZJGg30M2WPWu+xahbQ aXwcRLMS5MFAYcBrGEibVA
auth-client UoLbLw jwM4/d5BUfch4FLpGogouQ r9P/aNX3pWseC7tlXx1I5Q
======================================================================
with a total size of 1090 bytes.
I think this looks much nicer than the binary format and easier to parse
with the routerparse API as well.
+1.
...
...
iii) Should we use a fresh IV for each ENCRYPTED_DESC_COOKIE? rend-spec.txt
        does not do that, and IIUC that's OK because it uses a fresh key for
        every encryption (even though the plaintext and IV is the same).
People said this is a good idea, so in the example above I did it.
My main counter argument is the size increase, but perhaps being stingy
here is stupid.
In any case, the size overhead comes to 23 bytes of base64 for every IV,
so it's not that bad.
I think this is fine.

Re: [tor-dev] Revisiting prop224 client authorization

s7r