[tor-bugs] #23387 [Core Tor/Tor]: prop224: HSdir index desynch between client and service

Wed Sep 6 11:22:46 UTC 2017

#23387: prop224: HSdir index desynch between client and service
-----------------------------+------------------------------------
 Reporter:  asn              |          Owner:  (none)
     Type:  defect           |         Status:  needs_review
 Priority:  Very High        |      Milestone:  Tor: 0.3.2.x-final
Component:  Core Tor/Tor     |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:  prop224, tor-hs  |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+------------------------------------

Comment (by asn):

 Yo David,

 Seems like after our discussion about comment:4 (c) yesterday, we
 discovered a
 major reachability issue a few days before the freeze. Let's see how we
 can fix
 this.

 First of all, we should agree on what's the problem is. Let's consider
 `bug23387_asn` from #23387 to be the base here, as I think the valid_after
 fixes
 are essential. So in `bug23387_asn`, the main unsolved problem is (c) from
 comment:4 in #23387, aka scenario 6 from your `bug23387_032_02^` branch
 (minus
 top commit). That's the scenario where the HS has a newer consensus than
 the
 client, and the HS just moved to the next TP but the client is still stuck
 on
 the old one, and the service is not in any sort of overlap mode so it
 doesn't
 cover the old TP anymore.

 {{{
   +------------------------------------------------------------------+
   |                                                                  |
   | 00:00      12:00       00:00       12:00       00:00       12:00 |
   | SRV#1      TP#1        SRV#2       TP#2        SRV#3       TP#3  |
   |                                                                  |
   |  $==========|-----------$===========|-----------$===========|    |
   |                                    ^ ^                           |
   |                                    C S                           |
   +------------------------------------------------------------------+
 }}}

 Am I right, that this is the *only* problem right now? I'm a bit confused
 because I see an XXX in scenario 2 of your `bug23387_032_02^`, but I think
 that
 should be OK, and the problem was created by the top commit of that
 branch? I
 also see you citing scenario 2 in your bug23387_032_03 as the problem
 point.

 If I understand things correctly, I have two suggestions here:

 a) Short-term client-side solution: Design the shitty fallback system we
 discussed yesterday, where if the client fails to fetch the descriptor
 with
 SRV#1/TP#1, it will do a fallback fetch with SRV#2/TP#2.

 b) Medium-term service-side solution: Implement a ''reverse'' overlap
 period,
 where the service will keep on publishing the old descriptor even tho it
 has
 exited the overlap period, so that it covers for clients with old
 consensuses (so the old descriptor will still be with TP#1/SRV#1 in the
 example above)

 You can think of the normal overlap period, as a way to cover for clients
 that have a consensus in the future (after the TP rotates), and the
 reverse
 overlap period as a way to cover for clients that are in the past (before
 the TP rotates).

 I think your bug23387_032_03 was not doing the reverse overlap period
 concept, and was just creating the 'next' descriptor earlier in time. Or
 am
 I wrong?

 We don't need to do everything right now, but we should make sure that any
 future changes won't cause desynch between different versions of
 client/service.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23387#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online