[tor-bugs] #21394 [Core Tor/Tor]: connection timeouts are affecting Tor Browser usability

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Oct 26 13:27:54 UTC 2017


#21394: connection timeouts are affecting Tor Browser usability
-------------------------------------------------+-------------------------
 Reporter:  arthuredelstein                      |          Owner:  (none)
     Type:  defect                               |         Status:  new
 Priority:  Very High                            |      Milestone:  Tor:
                                                 |  0.3.3.x-final
Component:  Core Tor/Tor                         |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  tbb-performance, tbb-usability,      |  Actual Points:
  performance, tbb-needs                         |
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------
Changes (by teor):

 * milestone:  Tor: 0.3.2.x-final => Tor: 0.3.3.x-final


Comment:

 Replying to [comment:23 arthuredelstein]:
 > Replying to [comment:22 teor]:
 > > Replying to [comment:20 arthuredelstein]:
 > > > I did some more experiments:
 > > >
 > > > ...
 > > > Indeed I got 9/50 timeouts for the domain with http or https, but no
 timeouts for IPv4 and only a single timeout for IPv6.
 > > >
 > > > Does this ring any bells for Tor core experts? What might be
 happening with DNS here?
 > >
 > > Some exits may be overloading their resolvers. Or our code may be
 buggy. It would be helpful to identify the particular exits that are
 experiencing these timeouts, and work out if they are in the same AS or
 using the same resolvers.
 >
 > Makes sense. If the DNS resolve fails at an exit, does the exit send an
 error message back to the client? Or does it silently fail, meaning the
 client has to wait for the full 10-second timeout?

 It depends on how it fails.
 If the resolve times out at the exit, it also times out at the client,
 If the resolve fails fast, a an error cell is sent to the client.
 I don't think we can make this faster.

 > > I also wonder if we should ask bandwidth authorities to use DNS
 whenever possible, so they see DNS timeouts, and downgrade exits that have
 them. See #24010.
 >
 > Nice idea. Would it also be feasible to have exits periodically run
 diagnostics to see if their DNS resolution is working properly

 Yes. Exits already check DNS at startup, and turn off exit traffic if it
 fails. I opened #24014 in 0.3.3 to make them check periodically.

 > and if not, report the problem to bandwidth authorities

 There's no way for relays to report anything directly to the bandwidth
 authorities.
 Instead, relays modify their descriptors in response to self-checks.
 In this case, the relay would disable exit traffic until a DNS check
 succeeds, and clients would find out about it when they next download its
 (micro)descriptor after the next consensus.

 > and notify their relay operator?

 Yes, this would be part of #24014: we will log a warning when we disable
 exit traffic.

 > > The only node in a tor path that uses DNS is an exit, so if DNS
 breaks, it causes issues at the exit.
 >
 > That seems sensible. I'm only a little puzzled that it seems more common
 than I would expect that I saw not a single timeout, but a double, triple
 or quadruple timeout (see instances of 2,3,4 in my raw data). Presumably
 it's switching to a new exit node after each individual timeout, so why do
 I frequently see multiple timeouts for a single connection? Maybe it's
 just bad luck, but it made me wonder if I'm seeing something that goes
 wrong for the whole connection attempt and not just individual circuits.

 You could also have a slow guard, or a site that has slow DNS,

 But the most likely explanation is that some exits are massively
 overloaded, and DNS bears the brunt of that overloading.
 We could encourage relay operators to use a local DNS cache, but threads
 on this come up every month or two on tor-relays, so I'm not sure starting
 another would be useful.

 Another task that's in progress is to shift exit bandwidth away from the
 US east coast and Western Europe, because there's an over-allocation in
 that area at the moment. (It is where most bandwidth authorities have
 their HTTPS servers.)

 I would suggest that we find a way of monitoring this, so we can check if
 our fixes make a difference.
 This might be a task for metrics, I'll leave it to you to open a ticket,
 because you know what needs to be done to test for timeouts.

 There's nothing in this ticket that core tor can bugfix in 0.3.2, so I'm
 moving it to 0.3.3.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21394#comment:24>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list