[tor-bugs] #25347 [Core Tor/Tor]: Tor stops building circuits, and doesn't start when it has enough directory information

Tue Mar 27 11:18:37 UTC 2018

#25347: Tor stops building circuits, and doesn't start when it has enough directory
information
-------------------------------------------------+-------------------------
 Reporter:  teor                                 |          Owner:  asn
     Type:  defect                               |         Status:
                                                 |  assigned
 Priority:  Medium                               |      Milestone:  Tor:
                                                 |  0.3.3.x-final
Component:  Core Tor/Tor                         |        Version:  Tor:
                                                 |  0.3.0.6
 Severity:  Normal                               |     Resolution:
 Keywords:  031-backport, 032-backport,          |  Actual Points:
  033-must, tor-guard, tor-client, tbb-          |
  usability-website, tbb-needs,                  |
  033-triage-20180320, 033-included-20180320     |
Parent ID:  #21969                               |         Points:  1
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by asn):

 Thanks for the info s7r. I took another look at your logs (after my
 comment:5) and at your latest comment.

 Looking at your logs, it seems like your guard rejected about 230 new
 circuit creations in 15 minutes with the excuse of `RESOURCELIMIT`. And
 your client just kept making more and more circuits to the same guard that
 were getting rejected... I've also noticed this exact same behavior on a
 client of mine recently.

 My theory on why `RESOURCELIMIT` was used by your guard (given that you
 say that DoS patch was disabled) is that `assign_onionskin_to_cpuworker()`
 failed because `onion_pending_add()` failed because
 `have_room_for_onionskin()` failed. That means that the relay was
 overworked and had way too many cells to process at that time.
 Unfortunately, I can't see whether you are sending NTOR or TAP cells given
 your logs.

 Like you said, I think the most obvious misbehavior here is that you keep
 on hassling your guard even tho it's telling you to relax by sending your
 `RESOURCELIMIT` `DESTROY` cells. Perhaps one approach here would be to
 choose a different guard after a guard has sent us `RESOURCELIMIT` cells,
 in an attempt to unclog the guard and to get better service. '''Let's
 think about this some more:'''

 What's the best behavior here? Should we mark the guard as down after
 receiving a single `RESOURCELIMIT` cell, or should we hassle the guard a
 bit before giving up?

 Most importantly, can we make sure that the `DESTROY` cell came from the
 guard and not from some other node in the path? If we can make sure that
 the `DESTROY` cell came from the guard, this seem to me like a pretty safe
 countermeasure since we should trust the guard to tell us whether it's
 overworked or not.

 WRT timeline here, I think working on this countermeasure (mark guard as
 down when overworked to get better service) seems like a plausible goal
 for 033, but anything more involved will probably need to wait for 034.

 Would appreciate feedback from Nick or Tim here :)

 ----

 I still can't explain why you managed to bootstrap after hacking your
 state file tho. Perhaps a coincidence? Perhaps you were overworking your
 guard and when you stopped, it relaxed? Perhaps the hack worked
 differently than you imagine? Not sure.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25347#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online