[tor-bugs] #21394 [Core Tor/Tor]: connection timeouts are affecting Tor Browser usability

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Feb 8 05:12:09 UTC 2017


#21394: connection timeouts are affecting Tor Browser usability
--------------------------------------------+---------------------
 Reporter:  arthuredelstein                 |          Owner:
     Type:  defect                          |         Status:  new
 Priority:  Medium                          |      Milestone:
Component:  Core Tor/Tor                    |        Version:
 Severity:  Normal                          |     Resolution:
 Keywords:  tbb-performance, tbb-usability  |  Actual Points:
Parent ID:                                  |         Points:
 Reviewer:                                  |        Sponsor:
--------------------------------------------+---------------------

Comment (by arthuredelstein):

 I had a conversation with arma on IRC and he made many good suggestions on
 how to go about investigating this further (reprinted with permission):

 16:49 < arthuredelstein> In general, do connection timeout errors come
 from the exit node, or from the client?
 16:50 < armadev> it means you sent your begin cell, and then you didn't
 get an end cell or a connected cell after 10 seconds
 16:50 < armadev> it could be that you don't really have a tls connection
 to your guard at all, you just think you do
 16:51 < armadev> it could be that the exit receives the begin cell and
 quietly drops it
 16:51 < armadev> or maybe it gets the begin cell and starts its dns
 resolve and that takes a while
 16:51 < armadev> one way to investigate further might be to see if you
 ever get a connected or end cell if you waited longer
 16:52 < arthuredelstein> Ah, that's a good idea.
 16:54 < arthuredelstein> Do you have an hypothesis why there are so many
 timeouts? Do you think exits are dropping cells?
 16:54 < armadev> i am wondering if it has to do with the ipv6 thing
 16:54 < armadev> we have a bunch of bugs in ipv6 handling
 16:55 < arthuredelstein> that's interesting
 16:56 < arthuredelstein> in other words, handling at the exit?
 16:57 < armadev> yes
 16:57 < armadev> is there some pattern with which exits are on problem
 circuits?
 16:57 < armadev> you have the circuit events i hope so you can do the
 stats?
 16:57 < armadev> it is also possible that some exits, or even really just
 a few but really big ones, and running out of file descriptors or
 something
 16:58 < arthuredelstein> another good idea. I will look into that.
 16:58 < armadev> s/and running/are running/
 16:59 < armadev> people.tp.o has an ipv4 and ipv6 address. can you pick
 something simple and static that's only v4, and is that different?
 17:01 < arthuredelstein> makes sense
 17:02 < arthuredelstein> Something that made me wonder if it's something
 closer to the client or guard is that in my first batch of tests (to
 people.torproject.org) half of the attempted connections were double
 timeouts, meaning two circuits with different exits failed before a
 successful connection was made.
 17:03 < arthuredelstein> it's -> the cause of the timeouts is
 17:08 < armadev> another thing to explore is sending cells end-to-end on
 the circuit that we know should elicit an immediate response
 17:08 < armadev> like a begin to 127.0.0.1
 17:08 < armadev> which should immediately reply with 'end, exitpolicy'
 17:08 < armadev> and bypass any attempts by the exit to do a dns resolve,
 open a socket, make a tcp connection, etc
 17:16 < arthuredelstein> What's easiest way to send a begin cell?
 17:17 < armadev> make a socks request?
 17:17 < armadev> there might be something on the client side that tries to
 block a request to a destination it knows will fail
 17:17 < armadev> and also tor browser does isolation by socks parameters
 so the new socks request will be isolated to a different circuit
 17:18 < armadev> but i bet fixing those will still be more fun than my
 other answer, which is to check out how to call
 connection_ap_handshake_send_begin()
 17:19 < arthuredelstein> Right. I think Tor Browser is blocking
 connections to 127.0.0.1.
 17:19 < armadev> heck, the browser itself might be blocking those too
 17:19 < arthuredelstein> or possibly not making a socks connection
 17:19 < armadev> and the tor client will be blocking them even if the
 browser isn't
 17:19 < armadev> i guess that's yet another experiment:
 17:19 < armadev> do this same experiment with your tor client, no browser
 involved
 17:20 < armadev> and no weird socks isolation
 17:20 < arthuredelstein> Yes.
 17:20 < armadev> and no weird preferipv6 socksport flag
 17:21 < arthuredelstein> aha
 17:24 < arthuredelstein> I guess I can also try connecting to port 80 of
 the exit's IP address as an alternative to 127.0.0.1.
 17:25 < armadev> good idea
 17:25 < armadev> (though then you have to guess the exit already)
 17:25 < arthuredelstein> Yeah, I would need to turn off socks isolation.
 17:25 < arthuredelstein> Or maybe do this outside the browser
 17:26 < arthuredelstein> maybe I need to get acquainted with stem so I can
 automate these tests
 17:27 < arthuredelstein> assuming the browser isn't causing the problem
 somehow
 17:29 < armadev> having it automated would be extra cool because then it
 could be done again later without redoing all the work
 17:33 < armadev> let me hunt down a ticket you'll find fun and related
 (though alas not the same)
 17:35 < armadev> #5830
 17:40 < arthuredelstein> And I see you also mention the possibility of
 instrumenting a browser.
 19:49 < armadev> yet another thought: if this happens pretty consistently,
 can you collude with an exit relay to get debug-level logs at the time of
 the failure? to see what it sees and what it doesn't see? safelogging
 might make that harder.
 19:49 < arthuredelstein> yeah, that would be great
 19:50 < armadev> the precursor to that idea is: can you induce this
 behavior in a chutney network?
 19:50 < armadev> i would assume no, because it requires real users, real
 load, real broken exits. but who knows!
 19:50 < armadev> oh, and another: if you're curious if it's your guard, do
 the experiment again with a different guard!
 19:51 < arthuredelstein> yeah, I should definitely do that!
 19:52 < armadev> if your guard is overloaded, you could easily be seeing a
 delay there
 19:52 < armadev> or the intermediate node too, for that matter
 19:52 < arthuredelstein> right
 19:52 < armadev> where you have to wait for somebody's freight train of
 packets to move before you can get your connected cell
 19:54 < armadev> i guess category 1 of problem, you send your begin and it
 vanishes. you'll never get an answer.
 19:55 < armadev> category 2, everything's working, it's just
 slow/congested, and you need more patience than the hard-coded 10s
 timeout.
 19:55 < armadev> cranking up the timeout should help distinguish, for
 starters.
 19:55 < arthuredelstein> yes
 20:03 < arthuredelstein> Are there cases where a properly-behaving exit is
 expected to have category 1 behavior? Or should it always return an error
 message to the client if a tcp connection fails?
 20:10 < armadev> every non-response is a bug
 20:10 < armadev> are there bugs? there used to be! we don't know of any
 now.
 20:11 < armadev> but of course, weird tcp stacks, and firewalls with rules
 that drop packets, can induce long timeouts

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21394#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list