commit 13c75f888cf9ac401eab674a7b4652bab3d21c5d Author: teor (Tim Wilson-Brown) teor2345@gmail.com Date: Sat Oct 17 16:33:02 2015 +1100
prop 210 further clarifications --- .../210-faster-headless-consensus-bootstrap.txt | 77 ++++++++++---------- 1 file changed, 38 insertions(+), 39 deletions(-)
diff --git a/proposals/210-faster-headless-consensus-bootstrap.txt b/proposals/210-faster-headless-consensus-bootstrap.txt index 380e267..d3c56ff 100644 --- a/proposals/210-faster-headless-consensus-bootstrap.txt +++ b/proposals/210-faster-headless-consensus-bootstrap.txt @@ -27,16 +27,16 @@ Design: Bootstrap Process Changes and authority connections are tried. Mirror connections are tried at a faster rate than authority connections.
- Client Schedules: - Clients represent the majority of the load on the network. They can use directory mirrors to download their documents, as the mirrors download their documents from the authorities early in the consensus validity period.
We specify that client mirror connections retry after one second, and - then double the retry time with every connection: + then double the retry time with every connection attempt: 0, 1, 2, 4, 8, 16, 32, ... + (The timers currently implemented in Tor increment with every + connection failure.)
We specify that client directory authority connections retry after 10 seconds, and then double the retry time with every connection: @@ -46,22 +46,14 @@ Design: Bootstrap Process Changes IPv6 mirrors and authorities on the following schedule: IPv4, IPv6, IPv4, IPv6, ...
- Relay Schedules: + [ TODO: should we add random noise to these scheduled times? - teor + Tor doesn’t add random noise to the current failure-based + timers, but as failures are a network event, they are + somewhat random/arbitrary already. These attempt-based timers + will go off every few seconds, exactly erraon the second. ]
- Relays represent a small load on the network, but place a proportionally - greater load on the authorities [citation needed]. They can’t use - directory mirrors to download their documents, as they themselves are - the mirrors. - - We specify that relay directory authority connections retry after - 5 seconds, and then double the retry time with every connection: - 0, 5, 10, ... - - If a relay has both an IPv4 and IPv6 address, it will try IPv4 and - IPv6 mirrors and authorities on the following schedule: - IPv4, IPv4, IPv6, IPv4, IPv6, ... - - [ XXX: should we add random noise to these scheduled times? - teor ] + (Relays can’t use directory mirrors to download their documents, + as they *are* the directory mirrors.)
The maximum retry time for all these timers is 3 days + 1 hour. This places a small load on the mirrors and authorities, while allowing a @@ -70,8 +62,8 @@ Design: Bootstrap Process Changes
We try IPv4 first to avoid overloading IPv6-enabled authorities and mirrors. Each timing schedule uses a separate IPv4/IPv6 schedule. - This ensures that clients and relays try an IPv6 authority within - the first 10 seconds. This helps implement #8374 and related tickets. + This ensures that clients try an IPv6 authority within the first + 10 seconds. This helps implement #8374 and related tickets.
We don't want to keep on trying an IP version that always fails. Therefore, once sufficient IPv4 and IPv6 connections have been @@ -85,7 +77,9 @@ Design: Bootstrap Process Changes The retry timers and IP version schedules must reset on HUP and any network reachability events, so that clients that have unreliable networks can recover from network failures. - [ TODO: do we have network reachability events? ] + [ TODO: Do we do this for any other timers? + I think this needs another proposal, it’s out of scope here. + - teor ]
The first connection to complete will be used to download the consensus document and the others will be closed, after which bootstrapping will @@ -128,6 +122,8 @@ Design: Fallback Dir Mirror Selection should be set at 20% of the current Guard nodes (approximately 200 as of October 2015), rather than fixed at 100.
+ [TODO: change the script to dynamically calculate an upper limit.] + Performance: Additional Load with Current Parameter Choices
This design and the connection count parameters were chosen such that @@ -135,8 +131,7 @@ Performance: Additional Load with Current Parameter Choices authorities. In fact, the directory authorities should experience less load, because they will not need to serve the entire consensus document for a connection in the event that one of the directory mirrors complete - their connection before the directory authority does. (However, they - may need to serve the consensus document HEAD for clock checks.) + their connection before the directory authority does.
However, the scheme does place additional TLS connection load on the fallback dir mirrors. Because bootstrapping is rare, and all but one of @@ -179,31 +174,35 @@ Implementation Notes: Code Modifications update_consensus_networkstatus_downloads(), as the sockets are non-blocking. (This socket appears to be non-blocking on Unixes (SOCK_NONBLOCK & O_NONBLOCK) and Windows (FIONBIO).) As long as we can tolerate a timer resolution of - ~1 second (due to the use of time_t), this requires no additional timers or - callbacks. We can make 1 connection for each schedule per second, for a total - of 1-2 per second. + ~1 second (due to the use of second_elapsed_callback and time_t), this + requires no additional timers or callbacks. We can make 1 connection for each + schedule per second, for a maximum of 2 per second. + + The schedules can be specified in: + TestingClientBootstrapConsensusAuthorityDownloadSchedule + TestingClientBootstrapConsensusFallbackDownloadSchedule + (Similar to the existing TestingClientConsensusDownloadSchedule.)
- Since a tor relay fetches its directory material directly from the - authorities, it uses a different schedule. Multiple connections are - only useful to a relay if multiple authorities are down, so we retry - less often than for clients (clients use multiple fallbacks). - update_consensus_networkstatus_downloads() should check - directory_fetches_from_authorities() to choose the appropriate schedule. + TestingServerIPVersionPreferenceSchedule + (Consisting of a CSV like “4,6,4,6”, or perhaps “0,1,0,1”.)
- update_consensus_networkstatus_downloads() would also check the list of - pending connections and, if it is 10 or greater, skip the connection - attempt, and leave the retry time constant. + update_consensus_networkstatus_downloads() checks the list of pending + connections and, if it is 10 or greater, skip the connection attempt, + and leave the retry time constant.
- The code in directory_send_command() would need to be altered to check that we - are not already downloading the consensus. If we’re not, then download the - consensus on this connection, and close any other pending consensus dircons. + The code in directory_send_command() and connection_finished_connecting() + would need to be altered to check that we are not already downloading the + consensus. If we’re not, then download the consensus on this connection, and + close any other pending consensus dircons.
We might also need to make similar changes in authority_certs_fetch_missing(), as we can’t use a consensus until we have enough authority certificates. However, Tor already makes multiple requests (one per certificate), and only needs a majority of certificates to validate a consensus. Therefore, we will only need to modify authority_certs_fetch_missing() if clients download a - consensus, then end up getting stuck downloading certificates. + consensus, then end up getting stuck downloading certificates. (Current tests + show bootstrapping working well without any changes to authority certificate + fetches.)
Reliability Analysis
tor-commits@lists.torproject.org