[tor-bugs] #24767 [Core Tor/Tor]: All relays are constantly connecting to down relays and failing over and over

Tue Feb 6 11:22:01 UTC 2018

#24767: All relays are constantly connecting to down relays and failing over and
over
-------------------------------------------------+-------------------------
 Reporter:  arma                                 |          Owner:  dgoulet
     Type:  enhancement                          |         Status:
                                                 |  accepted
 Priority:  Very High                            |      Milestone:  Tor:
                                                 |  0.3.3.x-final
Component:  Core Tor/Tor                         |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  must-fix-before-033-stable, tor-     |  Actual Points:
  relay, tor-dos, performance                    |
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by arma):

 My thoughts for the first design would be:

 * No need for a complex backoff thing. Just remember the failure for 60
 seconds, and during those 60 seconds, send back a destroy immediately.
 Reducing from n attempts per minute down to 1 per minute should be enough
 to help us survive until the clients get a consensus update and stop
 asking us to try.

 * We should avoid having this cached-failure thing impact client behavior.
 That is, it should cause *other people's* circuits to get destroys, but it
 shouldn't auto-fail our own client attempts. Maybe we should change how
 the client behaves, but if so, let's do it later, and not introduce subtle
 breakage in something we'll be considering for backport.

 * Hm! I was going to say "but rep_hist_note_connect_failed() won't work if
 the relay isn't in our consensus", but actually, it is simply based on
 intended identity digest of the next destination, so it does look like we
 can reuse the data struct. Except, shouldn't we also be caching the
 IP:port that failed? Otherwise somebody can ask us to extend to a victim
 digest at an unworkable IP:port, and we'll cache "failure!" and then
 refuse all the legit cells going to the right address for the next minute.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24767#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online