[tor-bugs] #18816 [Core Tor/Tor]: We still wait 120 seconds for cert fetches from missing dir mirrors

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed May 4 05:45:52 UTC 2016


#18816: We still wait 120 seconds for cert fetches from missing dir mirrors
-------------------------------------------------+-------------------------
 Reporter:  arma                                 |          Owner:
     Type:  defect                               |         Status:
 Priority:  Medium                               |  needs_review
Component:  Core Tor/Tor                         |      Milestone:  Tor:
 Severity:  Normal                               |  0.2.8.x-final
 Keywords:  029-proposed, 029-nickm-unsure,      |        Version:  Tor:
  must-fix-before-028-rc                         |  0.2.8.1-alpha
Parent ID:                                       |     Resolution:
 Reviewer:                                       |  Actual Points:
                                                 |         Points:  small
                                                 |        Sponsor:
-------------------------------------------------+-------------------------
Changes (by teor):

 * status:  new => needs_review
 * points:  medium => small


Comment:

 Replying to [ticket:18816 arma]:
 > In #4483 and prop210 we set up an elaborate download schedule for
 consistently reaching fallbackdirs when fetching the consensus, so we
 don't end up just sitting there for 120 seconds while a tcp connection
 waits (and eventually the SocksTimeout parameter is reached and we move
 on).
 >
 > But we didn't do any similar thing with fetching the key certs. I just
 had my bootstrap go smoothly through the #4483 features (with the fixes
 from #18809) and then it stalled for 2 minutes trying to fetch the certs
 from a fallbackdir that's offline.
 >
 > Sure enough, in authority_certs_fetch_missing() I see
 > {{{
 >       /* XXX - do we want certs from authorities or mirrors? - teor */
 >       directory_get_from_dirserver(DIR_PURPOSE_FETCH_CERTIFICATE, 0,
 >                                    resource, PDS_RETRY_IF_NO_SERVERS,
 >                                    DL_WANT_ANY_DIRSERVER);
 > }}}
 >
 > So teor noticed this one too.
 >
 > I think in 0.2.8, if we leave the fallbackdir stuff in (meaning we merge
 #18809 or equivalent into 0.2.8), we could bandage this one by changing
 DL_WANT_ANY_DIRSERVER to DL_WANT_AUTHORITY, and then it wouldn't be much
 worse than it is now (in terms of performance -- we would indeed lose the
 ability to bootstrap from scratch when the authorities are unavailable).

 This could be caused by an existing bug in
 download_status_reset_by_sk_in_cl, download_status_is_ready_by_sk_in_cl,
 and get_cert_list, where we don't consistently initialise the schedule.
 Some certificate fetches get the generic schedule, others get the
 consensus schedule.
 Some certificate fetches initialise the schedule, others don't bother.

 See f2e9af1 Use the consensus download schedule for authority certificates
 in my branch bug18816 on https://github.com/teor2345/tor.git

 With this fix, tor will use
 ClientBootstrapConsensusFallbackDownloadSchedule for certificate fetches,
 which starts 0, 1, 4, 11, yielding 4 tries for each certificate in the
 first 16 seconds. The limit TestingCertMaxDownloadTries is 8, which we'll
 never reach in any reasonable time.

 This is still not great if a significant number of fallbacks are blocked
 or blackholed.

 So let's try an authority to start with, then try a fallback if the
 authority fails (this is  better than 0.2.7, but still has issues with
 blackholed authorities and fallbacks).

 87fdbb6 Switch between fallback and authority when auth cert fetch fails
 bea0819 fixup! Switch between fallback and authority when auth cert fetch
 fails
 (Try an authority first, because they're more likely to work first time
 than a fallback.)

 There are a few options to deal with blackholed fallbacks and authorities:
 * do what we do with the consensus, and try multiple, simultaneous
 connections to both authorities and fallback directories, use the first
 one that succeeds, and close the rest,
 * if the connection to a fallback fails, try an authority (this still
 doesn't help with blackholed fallbacks),
 * or any of the other options arma mentions:

 > Longer term (0.2.9 and later), I think we should explore a) having
 directory_get_from_dirserver() notice that there are tls conns established
 to dir mirrors that we just recently used (and prefer them), or b) trying
 to explicitly remember the dir mirror that gave us the consensus and re-
 use it, and/or c) designing a piggy-back mechanism so we can ask for "the
 certs that go with this consensus" when we're fetching a consensus and we
 know we will want the certs for it too (thus saving a round-trip).

 I've split these off into #18963 so we can deal with them, maybe in 0.2.9.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18816#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list