[tor-bugs] #24767 [Core Tor/Tor]: All relays are constantly connecting to down relays and failing over and over

Fri Mar 23 13:31:13 UTC 2018

#24767: All relays are constantly connecting to down relays and failing over and
over
-------------------------------------------------+-------------------------
 Reporter:  arma                                 |          Owner:  dgoulet
     Type:  enhancement                          |         Status:
                                                 |  needs_review
 Priority:  Very High                            |      Milestone:  Tor:
                                                 |  0.3.3.x-final
Component:  Core Tor/Tor                         |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  tor-relay, tor-dos, performance,     |  Actual Points:
  review-group-32, 033-must, review-group-34,    |
  033-triage-20180320, 033-included-20180320     |
Parent ID:                                       |         Points:
 Reviewer:  asn                                  |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by dgoulet):

 >  Is it okay with you if I change the name anyway? The problem is that
 the "since" implies a continuous condition. If I say "I have been unable
 to eat broccoli since 1982", it does not mean only that in 1982 I failed
 to eat broccoli: it also means that at every time since 1982, I have also
 been unable to eat broccoli.

 Renamed `unable_connect_since_ts` in commit `0410a5e49c`.

 >  But the preferred address can change while Tor is running, I think,
 based on configuration options. That would make the value cached in the
 node become incorrect, and that's why I think we should maybe just not
 have this in the node_t at all.

 If a `node_t` preferred address changes, indeed we won't match the
 connection to a `node_t` anymore and thus we will fallback to using the
 cache for unknown nodes. And that object in the cache will be used until
 the `node_t` gets modified by I guess the new descriptor at some point. So
 all in all, we'll still be tracking connection failure correctly.

 The only reason why I'm still a bit hesitating on using the "unknown
 cache" for everything is maybe because of performance. If we would use
 that, at *every* connection, we would need to do a lookup where the hash
 function is a bit heavier with this addr + port + digest triplet than
 simply looking up the node list. Furthermore, every 60 seconds
 (`OR_CONNECT_FAILURE_CLEANUP_INTERVAL`), we cleanup that cache meaning we
 can iterate over thousands of objects in there (potentially more than the
 number of relays). I don't think it would be a significant cost but still
 important to keep in mind. Keeping it tiny can worth it.

 However, just using the cache (and not the node_t), would make things a
 bit more simpler in the code, we would just need to accept the additional
 memory and lookup/iteration price.

 My testing on a real relays shows that the majority of the times, we'll
 get our information out of the `node_t`.

 At this point, I can lean towards both sides, I might just need a second
 opinion?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24767#comment:38>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online