[tor-bugs] #9946 [Tor]: Bootstrapping Tor client hangs up on its chosen guards

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Oct 9 19:30:13 UTC 2013


#9946: Bootstrapping Tor client hangs up on its chosen guards
------------------------+------------------------------------
 Reporter:  arma        |          Owner:
     Type:  defect      |         Status:  new
 Priority:  normal      |      Milestone:  Tor: 0.2.4.x-final
Component:  Tor         |        Version:
 Keywords:  tor-client  |  Actual Points:
Parent ID:              |         Points:
------------------------+------------------------------------
 Here are some log snippets from a freshly bootstrapping Tor client:
 {{{
 Sep 27 03:24:11.369 [info] add_an_entry_guard(): Chose
 $3DD2523F1B241F01D54818F327714CDA7F54238B~wannabe2 at 144.76.110.103 as
 new entry guard.
 Sep 27 03:24:11.369 [info] log_entry_guards(): wannabe2
 [3DD2523F1B241F01D54818F327714CDA7F54238B] (up never-contacted)
 Sep 27 03:24:11.371 [info] add_an_entry_guard(): Chose
 $821FAFA24651419FAE76A5C80BA79EA7E0743C4D=servbr3 at 212.83.158.171 as new
 entry guard.
 Sep 27 03:24:11.371 [info] log_entry_guards(): wannabe2
 [3DD2523F1B241F01D54818F327714CDA7F54238B] (up never-contacted),servbr3
 [821FAFA24651419FAE76A5C80BA79EA7E0743C4D] (up never-contacted)
 Sep 27 03:24:11.373 [info] add_an_entry_guard(): Chose
 $CFB4CB595C2504444E8F8F76CAC42F0C97200734=SECxFreeBSD64 at 195.169.125.226
 as new entry guard.
 Sep 27 03:24:11.373 [info] log_entry_guards(): wannabe2
 [3DD2523F1B241F01D54818F327714CDA7F54238B] (up never-contacted),servbr3
 [821FAFA24651419FAE76A5C80BA79EA7E0743C4D] (up never-
 contacted),SECxFreeBSD64 [CFB4CB595C2504444E8F8F76CAC42F0C97200734] (up
 never-contacted)
 }}}

 So now I have my three guards, and they're all considered up (the
 consensus said so) and never-contacted (that's how they start out).

 {{{
 Sep 27 03:24:11.362 [info] launch_descriptor_downloads(): Launching 45
 requests
  for 4084 microdescs, 92 at a time
 }}}

 Then choose_random_entry_impl() runs this code:
 {{{
       if (!entry->made_contact) {
         /* Always start with the first not-yet-contacted entry
          * guard. Otherwise we might add several new ones, pick
          * the second new one, and now we've expanded our entry
          * guard list without needing to. */
         goto choose_and_finish;
       }
 }}}

 so all 45 of those microdescriptor requests decide to ask wannabe2. It
 queues its 45 directory attempts.

 So it connects to wannabe2:

 {{{
 Sep 27 03:24:11.373 [debug] directory_initiate_command_rend(): Initiating
 microdescriptor fetch
 Sep 27 03:24:11.373 [info] connection_ap_make_link(): Making internal
 direct tunnel to 144.76.110.103:443 ...
 [...]
 Sep 27 03:24:11.373 [debug] circuit_handle_first_hop(): Looking for
 firsthop '144.76.110.103:443'
 Sep 27 03:24:11.373 [info] circuit_handle_first_hop(): Next router is
 $3DD2523F1B241F01D54818F327714CDA7F54238B~3DD2523F1B241F01D54 at
 144.76.110.103: Not connected. Connecting.
 [...]
 Sep 27 03:24:11.373 [debug] connection_connect(): Connecting to
 "144.76.110.103":443.
 Sep 27 03:24:11.373 [debug] connection_connect(): Connection to
 "144.76.110.103":443 in progress (sock 13).
 Sep 27 03:24:11.373 [debug] connection_add_impl(): new conn type OR,
 socket 13, address 144.76.110.103, n_conns 7.
 [...]
 Sep 27 03:24:11.373 [debug] circuit_handle_first_hop(): connecting in
 progress (or finished). Good.
 [...]
 Sep 27 03:24:11.484 [debug] connection_or_finished_connecting(): OR
 connect() to router at 144.76.110.103:443 finished.
 Sep 27 03:24:11.485 [debug] connection_tls_start_handshake(): starting TLS
 handshake on fd 13
 [...]
 Sep 27 03:24:11.937 [info] channel_tls_process_versions_cell(): Negotiated
 version 4 with 144.76.110.103:443; Waiting for CERTS cell
 Sep 27 03:24:11.937 [debug] connection_or_process_cells_from_inbuf(): 13:
 starting, inbuf_datalen 1473 (0 pending in tls object).
 Sep 27 03:24:11.938 [info] channel_tls_process_certs_cell(): Got some good
 certificates from 144.76.110.103:443: Authenticated it.
 [...]
 Sep 27 03:24:11.938 [debug] circuit_n_chan_done(): chan to
 NULL/144.76.110.103:443, status=1
 Sep 27 03:24:11.938 [debug] circuit_n_chan_done(): Found circ, sending
 create cell.
 [...]
 Sep 27 03:24:11.938 [info] circuit_send_next_onion_skin(): First hop:
 finished sending CREATE_FAST cell to
 '$3DD2523F1B241F01D54818F327714CDA7F54238B~wannabe2 at 144.76.110.103'
 Sep 27 03:24:11.938 [info] channel_tls_process_netinfo_cell(): Got good
 NETINFO cell from 144.76.110.103:443; OR connection is now open, using
 protocol version 4. Its ID digest is
 3DD2523F1B241F01D54818F327714CDA7F54238B. Our address is apparently
 76.99.61.6.
 }}}

 But wait, what's that? We send our CREATE_FAST cells before we get our
 NETINFO cell? Not sure if that is a bug.

 In any case, time passes, and we give up on all those circuits:
 {{{
 Sep 27 03:25:12.150 [notice] No circuits are opened. Relaxed timeout for
 circuit 3 (a General-purpose client 1-hop circuit in state doing
 handshakes with channel state open) to 60000ms. However, it appears the
 circuit has timed out anyway. 0 guards are live.
 Sep 27 03:25:12.150 [info] circuit_expire_building(): Abandoning circ 3
 144.76.110.103:443:-2147451209 (state 0,0:doing handshakes, purpose 5, len
 1)
 Sep 27 03:25:12.150 [info] internal circ (length 1, last hop
 3DD2523F1B241F01D54818F327714CDA7F54238B):
 $3DD2523F1B241F01D54818F327714CDA7F54238B(waiting for keys)
 [...]
 Sep 27 03:25:12.150 [info] circuit_build_failed(): Our circuit failed to
 get a response from the first hop (144.76.110.103:443). I'm going to try
 to rotate to a better connection.
 Sep 27 03:25:12.150 [info] entry_guard_register_connect_status(): Unable
 to connect to entry guard 'wannabe2'
 (3DD2523F1B241F01D54818F327714CDA7F54238B). Marking as unreachable.
 }}}

 and then we give up on the 45 directory requests:
 {{{
 Sep 27 03:25:12.150 [info] connection_ap_fail_onehop(): Closing one-hop
 stream to '$3DD2523F1B241F01D54818F327714CDA7F54238B/144.76.110.103'
 because the OR conn just failed.
 Sep 27 03:25:12.150 [info] connection_ap_fail_onehop(): Closing one-hop
 stream to '$3DD2523F1B241F01D54818F327714CDA7F54238B/144.76.110.103'
 because the OR conn just failed.
 [and 43 more]
 }}}

 So let's try again! We marked our first guard down so now we'll ask our
 second guard:

 {{{
 Sep 27 03:25:14.151 [debug] launch_descriptor_downloads(): There are
 enough downloadable microdescs to launch requests.
 Sep 27 03:25:14.151 [info] launch_descriptor_downloads(): Launching 45
 requests for 4084 microdescs, 92 at a time
 }}}

 {{{
 Sep 27 03:25:14.159 [info] onion_pick_cpath_exit(): Using requested exit
 node '$821FAFA24651419FAE76A5C80BA79EA7E0743C4D~821FAFA24651419FAE7 at
 212.83.158.171'
 [...]
 Sep 27 03:25:14.160 [info] circuit_handle_first_hop(): Next router is
 $821FAFA24651419FAE76A5C80BA79EA7E0743C4D~821FAFA24651419FAE7 at
 212.83.158.171: Not connected. Connecting.
 [...]
 Sep 27 03:25:14.160 [debug] connection_connect(): Connecting to
 "212.83.158.171":443.
 Sep 27 03:25:14.160 [debug] connection_connect(): Connection to
 "212.83.158.171":443 in progress (sock 13).
 Sep 27 03:25:14.160 [debug] connection_add_impl(): new conn type OR,
 socket 13, address 212.83.158.171, n_conns 6.
 [...]
 Sep 27 03:25:14.261 [debug] connection_or_finished_connecting(): OR
 connect() to router at 212.83.158.171:443 finished.
 [...]
 Sep 27 03:25:14.683 [info] channel_tls_process_versions_cell(): Negotiated
 version 4 with 212.83.158.171:443; Waiting for CERTS cell
 [...]
 Sep 27 03:25:14.683 [info] channel_tls_process_certs_cell(): Got some good
 certificates from 212.83.158.171:443: Authenticated it.
 }}}

 And here's where it gets good:

 {{{
 Sep 27 03:25:14.683 [info] entry_guard_register_connect_status():
 Connected to new entry guard 'servbr3'
 (821FAFA24651419FAE76A5C80BA79EA7E0743C4D). Marking earlier entry guards
 up. 0/3 entry guards usable/new.
 Sep 27 03:25:14.683 [info] log_entry_guards(): wannabe2
 [3DD2523F1B241F01D54818F327714CDA7F54238B] (up made-contact),servbr3
 [821FAFA24651419FAE76A5C80BA79EA7E0743C4D] (up made-contact),SECxFreeBSD64
 [CFB4CB595C2504444E8F8F76CAC42F0C97200734] (up never-contacted)
 Sep 27 03:25:14.683 [debug] channel_do_open_actions(): New entry guard was
 reachable, but closing this connection so we can retry the earlier entry
 guards.
 Sep 27 03:25:14.683 [debug] circuit_n_chan_done(): chan to
 NULL/212.83.158.171:443, status=0
 Sep 27 03:25:14.683 [info] circuit_n_chan_done(): Channel failed; closing
 circ.
 Sep 27 03:25:14.683 [info] circuit_build_failed(): Our circuit died before
 the first hop with no connection
 Sep 27 03:25:14.683 [info] entry_guard_register_connect_status(): Unable
 to connect to entry guard 'servbr3'
 (821FAFA24651419FAE76A5C80BA79EA7E0743C4D). Marking as unreachable.
 }}}

 And then we close all the directory requests that we'd queued for it
 because we closed the connection:

 {{{
 Sep 27 03:25:14.683 [info] connection_ap_fail_onehop(): Closing one-hop
 stream to '$821FAFA24651419FAE76A5C80BA79EA7E0743C4D/212.83.158.171'
 because the OR conn just failed.
 [and 44 more]
 }}}

 And then my client launches 45 new directory requests, which spread over
 the three available guards and things bootstrap ok.

 It's the "New entry guard was reachable, but closing this connection so we
 can retry the earlier entry guards." behavior that is the bug here. I
 think the fix might be to mark newly selected guards as already-contacted
 when you first pick them at the beginning of time.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9946>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list