[tor-bugs] #27841 [Core Tor/Tor]: Surprise race: Intro point closes circuit after NACK, at the same time as client tries to extend circuit to new intro point

Wed Aug 28 15:45:20 UTC 2019

#27841: Surprise race: Intro point closes circuit after NACK, at the same time as
client tries to extend circuit to new intro point
-------------------------------------------------+-------------------------
 Reporter:  asn                                  |          Owner:  neel
     Type:  defect                               |         Status:
                                                 |  reopened
 Priority:  Medium                               |      Milestone:  Tor:
                                                 |  0.3.5.x-final
Component:  Core Tor/Tor                         |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  tor-hs dos 035-backport              |  Actual Points:
  040-backport 041-backport                      |
Parent ID:                                       |         Points:
 Reviewer:  dgoulet                              |        Sponsor:
                                                 |  Sponsor27
-------------------------------------------------+-------------------------
Changes (by dgoulet):

 * status:  closed => reopened
 * keywords:  tor-hs dos => tor-hs dos 035-backport 040-backport
               041-backport
 * resolution:  fixed =>

Comment:

 I would like us to strongly reconsider the backport this back down to 035.
 Reason is that it is really badly affecting tor clients and thus HS
 reachability. Here is how/why:

 (The following considers that every time the client reaches the intro
 point, it gets NACKed because it has the old descriptor.)

 1. The obvious issue is that tor clients gets the intro circuit destroyed
 while it is trying to re-extend to a new IP. This itself, requires the
 client to do many round trips before noticing and then re-opening a new
 circuit to see the same again until all 3 IP have failed.

 2. This one is a more serious issue.

  I've experienced during my testing a client looping over all IPs trying
 to establish an intro point but instead getting its circuit `TRUNCATED`
 for internal reason just _after_ sending the `INTRODUCE1` cell. This is
 not seen as an "intro failure" by the client so the intro point will be
 retried.

  However, how can we get a `TRUNCATED` _before_ the `INTRODUCE_ACK` nack-
 ing our request? This behavior I've seen a lot where a client can make
 20-30 tries before it finally gets a NACK.

  The reason I believe is in our cell scheduler: When selecting an active
 channel, we ask the cmux subsystem to give the first active circuit queue,
 this is done in `circuitmux_get_first_active_circuit()`. But, we alternate
 between `DESTROY` cell queue and the RELAY cell queue.

  When the intro point sends a NACK, it first queues the `INTRODUCE_ACK`
 cell and then, because of this bug still everywhere in the network, it
 queues a `DESTROY` just after. Then our scheduler, at that point in time,
 decides to send the `DESTROY` _before_ the ack resulting in our client
 receiving a truncated cell, not noticing the NACK and thus retrying the
 same intro point after.

  If no `DESTROY` cells were sent on the channel cmux yet, then it is
 prioritized from the relay cell. So, it is not actually 1/2 chance of
 hitting this, I believe it is much high probability to hit the issue I
 just described especially on smaller relays.

 Considering the above, I'm strongly asking for 035, 040 and 041 backport.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/27841#comment:26>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online