[tor-bugs] #25347 [Core Tor/Tor]: Tor keeps on trying the same overloaded guard over and over

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Apr 5 01:39:54 UTC 2018


#25347: Tor keeps on trying the same overloaded guard over and over
-------------------------------------------------+-------------------------
 Reporter:  teor                                 |          Owner:  asn
     Type:  defect                               |         Status:
                                                 |  needs_revision
 Priority:  Medium                               |      Milestone:  Tor:
                                                 |  0.3.3.x-final
Component:  Core Tor/Tor                         |        Version:  Tor:
                                                 |  0.3.0.6
 Severity:  Normal                               |     Resolution:
 Keywords:  031-backport, 032-backport,          |  Actual Points:
  033-must, tor-guard, tor-client, tbb-          |
  usability-website, tbb-needs,                  |
  033-triage-20180320, 033-included-20180320     |
Parent ID:  #21969                               |         Points:  1
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by arma):

 More thoughts while pondering this discussion:

 (1) It is really surprising to me that s7r could have been experiencing
 this bug (as currently described) for 7 hours. I think it was probably
 some other bug if it was really that long. For example, one of the ones
 where we mark things down and stop trying them, or one of the ones like
 #21969. asn says he only looked at a tiny fraction of the logs of the 7
 hours, so let's be careful jumping to conclusions about what bug (or
 combination of bugs) he actually experienced.

 (2) If a relay sends you a destroy cell with reason resourcelimit, it
 means that the relay has so many create cells on its queue that it thinks
 it won't get to this one in the next 1.75 seconds (MaxOnionQueueDelay
 default). So that's some real overload right there -- especially since
 even if you send another one and you don't get a destroy back, it means
 you squeaked into the queue, but you still have all those other creates
 ahead of you.

 (2b) Do we have any reason to believe that the calculation in
 have_room_for_onionskin() is at all accurate? That is, are we sometimes
 sending this response when there are only 0.25 seconds worth of create
 cells in our queue? Or are we sometimes not sending them even though there
 are 5 seconds of cells queued?

 (3) It would be nice to find a way for the dir auths to scale back the
 consensus weights of relays that are overloaded like this. That is, it
 would sure be swell if we could make this something that the dir auths
 solve for all the users, not something that each user has to encounter and
 then adapt to. But while I see why we want that, we should be realistic
 and realize that we won't get it: the dir auths act on a time schedule of
 hours, so they will catch perenially overloaded relays (say, relays that
 genuinely have a wildly wrong weighting or are just simply broken), but
 they won't be able to catch transient hotspots (including hotspots induced
 by bad people).

 (4): I think we really need to figure out how how often this happens in
 practice. That means scanning relays over time. Now, it happens that
 pastly's sbws might be able to collect this data for us. Also, the
 torperfs and onionperfs of the world could have this data already too, if
 they collect it. Do they? Noticing it in sbws has the slight added
 advantage that if we can figure out how to use it in computing weights,
 it's all ready to be used.

 (5) I would be a fan of a feature where we track the destroy-resource-
 limit responses we receive over time, and if there have been (say) 30
 different seconds recently where we got at least one destroy-resource-
 limit, and none of our attempts worked, we call the guard down. We
 shouldn't call it down in response to just one hotspot though (e.g. "I
 sent twenty create cells and I got twenty destroy-resource-limit
 responses"), since they're correlated with each other, that is, if you got
 one then it's not surprising that you'll get a second right after. And we
 might want to retry a guard that we mark down this way sooner than the 30
 -minute-later default from prop271.

 (6) I agree with asn that making it too easy to force a client to rotate
 guards is scary. The pool of 60-or-so guards from prop271 is a huge pool,
 and the only way to use that design securely imo is to have it be the case
 that some of those 60 guards are very hard to push clients away from.

 (7) I agree with Mike that the confirmation attack ("send a bunch of
 create cells to each guard one at a time and see when your target onion
 service stops responding") is worrisome. But would a bandwidth congestion
 attack work there too? I guess it would be more expensive to pull off with
 the same level of reliability.

 (7b) In a two guard design, I wonder if we should be even more reluctant
 to abandon a guard due to a transient problem like this. After all, if we
 do abandon one, we're increasing our surface area past two. And if we
 don't, in theory we still have one that's working.

 (8) Remember CREATE_FAST? If your guard is otherwise fine but it's too
 busy to process your create cell... and you were about to do something
 foolish to your anonymity like move to another guard or go offline in
 response... hm. :)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25347#comment:34>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list