[tor-dev] Revisiting prop224 time periods and HS descriptor upload/downloads
desnacked at riseup.net
Mon Apr 4 16:13:39 UTC 2016
during March we discussed the cell formats of prop224:
The prop224 topic for this month has to do with the way descriptors get
uploaded and downloaded, how this is scheduled using time periods and how the
shared randomness subsystem interacts with all that.
Here are some discussion topics. Lots of text on the first two, less text on the rest:
- My main goal was to understand the prop224 sections [TIME-PERIODS] and [TIME-OVERLAP].
Those sections specify a system where hidden services decide in a
probabilistic manner _when_ to publish their descriptor so that not all
hidden services publish their descriptors at the same moment and cause a
thundering herd that stampedes the network.
For this to work, time is split into time periods of k hours each. A few
hours before each time period, there is an overlap period where hidden
services start publishing their _next_ descriptors to HSDirs, so that when
the upcoming time period starts, all the HSDirs have already received the
descriptors and are ready to serve them.
Consider the overlap period at the end of time period #N. During that overlap
period, hidden services publish their descriptors for future time period
#N+1. In this case, hidden services also need to know the shared random
value that will be active during time period #N+1, since it needs to be used
to find the responsible HSDirs. This means, that the shared random value for
time period #N+1 needs to be published _before_ the overlap period starts.
This is not the case in current proposal 224, since time is split into time
periods of 25 hours, which means that each day the start time shifts by one
hour forward. Since the start/end times of the time periods keep on shifting,
there will be cases where the right shared random value will not be
accessible when the overlap period starts.
So what to do?
To fix this, I suggest we change the time period length to a day (24 hours).
I also suggest we start time periods every day at 12:00 and finish after 24
hours same time, so that it works well with the current shared randomness
schedule (where the new shared random value gets published at 00:00 every day).
[It might actually be wiser to actually reverse those schedules: create the
SRV at 12:00 and start the time period at 00:00]
In any case, this is how this might look like:
| 00:00 12:00 00:00 12:00 00:00 12:00 |
| SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 |
| $ |-----------$-----======|-----------$-----======| |
| overlap12 overlap23 |
Legend: [TP#1 = Time Period #1]
[SRV#1 = Shared Random Value #1]
So, this basically gives a space of 12 hours between the SRV generation and
the start of the next time period. We can then easily fit an overlap period
of 6 hours before the next time periods starts. In the above diagram, the
"equal sign" segments are the overlap periods. 'overlap12' is the overlap
period from TP#1 to TP#2.
Do you think that's reasonable? And do you see any problems with changing the
time period length from 25 hours to 24 hours?
- So now that we have ironed out the time period stuff slightly, let's discuss
the behavior that hidden services, clients and HSDirs should inherit.
This email is quite long already so I'm going to go with examples, instead of
formal specification. However, this stuff needs to go formally in the
proposal IMO, so any help in formalizing it would be great.
+ Hidden Service behavior:
Example 1: Our hidden service boots up at 14:00 of TP#1. In this case, we
are nowhere close to the overlap period, so the hidden service should just
publish its TP#1 descriptor to the HSDir hash ring using SRV#1 (which at
that point should be in consensuses as "shared-rand-current-value").
The hidden service might also want to calculate its overlap OFFSET (as
specified in [TIME-OVERLAP]) and schedule a time callback for publishing
its TP#2 descriptors.
Example 2: Our hidden service boots up at 03:00 of TP#1. That's outside of
the overlap period again, but this time the hidden service needs to use the
SRV from "shared-rand-previous-value" because the SRV was rotated at midnight.
Example 3: Our hidden service boots up at 09:00 of TP#1. That's inside the
overlap period, so the hidden service should calculate its overlap
OFFSET and compare it with the current time.
If it has not passed, then we are in the exact same case as Example 2.
If the overlap OFFSET _has_ passed, then the hidden service needs to act
as Example 2, and _also_ publish its TP#2 descriptors to a second set of
HSDirs using SRV#2.
I think these are all the cases for the hidden service, but I would like to
formalize this in a way that can be written in the spec. Particularly, I'm
not sure how to formalize which SRV to pick at a given time point.
+ Client behavior
My current intuition with regards to client behavior is that they should
always fetch descriptors from the HSDirs of the _current_ time period. They
should not concern themselves with the overlap stuff _at all_. The overlap
system is there so that by the time the new time period starts, all the
HSDirs have received the descriptors and are ready to help the
clients. Clients should never notice the overlap stuff happening.
For this reason I think we can remove this paragraph from the spec:
When a client is looking for a service, it must calculate its key
both for the current and for the subsequent period, to decide whether
the next period's key is valid yet.
What do you think?
+ HSDir behavior
Currently the spec says the following:
Hidden service directories should accept descriptors at least [TODO:
how much?] minutes before they would become valid, and retain them
for at least [TODO: how much?] minutes after the end of the period.
After discussion with David, we thought of chopping off the first part of
that paragraph and not imposing any such weak restrictions for accepting
descriptors (see #18332).
We still have not decided about the second part of that paragraph, that is
how long descriptors should be retained after the end of the period. We
currently think clock skew is the only thing that can bring clients to the
wrong HSDir after the end of the period. Maybe an hour is OK? David
suggested 12 hours. The current Tor is doing 48 hours... Any ideas?
And this half-assedly sums up the behaviors of clients/HSes and HSDirs with
regards to descriptor uploads and downloads. What is missing, and do you
agree that parts of this should be in the proposal?
- We should revert the torspec commit: "prop224: avoid replicas with the same blinded key"
It adds a whole lot of complexity to prop224 with no clear security benefit
against realistic adversaries. Furthermore, the time period and descriptor
download/upload logic of Tor gets very complicated with it.
I discussed this with teor and special and found it reasonable.
- The randomized revision-counter logic should also be simplified or even removed:
I haven't looked much into this yet. If someone has thoughts please let me know.
- We should use fresh salt every time we rebuild the descriptor, but not for every replica:
- teor says we should revert the double hashing here, and just use tor's random API:
More information about the tor-dev