Hello,
proposal 259 has evolved plenty since the last time it was posted on [tor-dev]. All the technical discussion can be found in various threads, and here is a thread about the proposal itself as it is now.
The current proposal is pretty much done, and it's currently being implemented for testing. We imagine that during testing we might have to slightly alter the algorithm and its parameters to improve performance and security.
Here are some behaviors that are still unspecified in the current proposal based on our discussion yesterday:
- What exactly should be done if a circuit has restrictred requirements? (e.g. it needs a Fast/Stable guard, but our top guard is not Fast/Stable)
- How exactly should directory guards work? Should the guard lists be initialized only with directory guards? Or should we just initialize our guard lists from the set of all guards, and just skip non-V2Dir guards whenever we need to make a directory circuit? FWIW, there are currently 2076 guards, out of which 1659 support V2Dir (i.e. they are directory guards).
- How should the algorithm work wrt ReachableAddresses? I wonder how the current algorithm work wrt ReachableAddresses and if it's behavior is good.
---
Filename: 259-guard-selection.txt Title: New Guard Selection Behaviour Author: Isis Lovecruft, George Kadianakis, [Ola Bini] Created: 2015-10-28 Status: Draft
§1. Overview
Tor uses entry guards to prevent an attacker who controls some fraction of the network from observing a fraction of every user's traffic. If users chose their entries and exits uniformly at random from the list of servers every time they build a circuit, then an adversary who had (k/N) of the network would deanonymize F=(k/N)^2 of all circuits... and after a given user had built C circuits, the attacker would see them at least once with probability 1-(1-F)^C. With large C, the attacker would get a sample of every user's traffic with probability 1.
To prevent this from happening, Tor clients choose a small number of guard nodes (currently 3). These guard nodes are the only nodes that the client will connect to directly. If they are not compromised, the user's paths are not compromised.
But attacks remain. Consider an attacker who can run a firewall between a target user and the Tor network, and make many of the guards they don't control appear to be unreachable. Or consider an attacker who can identify a user's guards, and mount denial-of-service attacks on them until the user picks a guard that the attacker controls.
In the presence of these attacks, we can't continue to connect to the Tor network unconditionally. Doing so would eventually result in the user choosing a hostile node as their guard, and losing anonymity.
This proposal outlines a new entry guard selection algorithm, which addresses the following concerns:
- Heuristics and algorithms for determining how and which guard(s) is(/are) chosen should be kept as simple and easy to understand as possible.
- Clients in censored regions or who are behind a fascist firewall who connect to the Tor network should not experience any significant disadvantage in terms of reachability or usability.
- Tor should make a best attempt at discovering the most appropriate behaviour, with as little user input and configuration as possible.
§2. Design
Alice, an OP attempting to connect to the Tor network, should undertake the following steps to determine information about the local network and to select (some) appropriate entry guards. In the following scenario, it is assumed that Alice has already obtained a recent, valid, and verifiable consensus document.
The algorithm is divided into four components such that the full algorithm is implemented by first invoking START, then repeatedly calling NEXT while adviced it SHOULD_CONTINUE and finally calling END. For an example usage see §A. Appendix.
Several components of NEXT can be invoked asynchronously. SHOULD_CONTINUE is used for the algorithm to be able to tell the caller whether we consider the work done or not - this can be used to retry primary guards when we finally are able to connect to a guard after a long network outage, for example.
This algorithm keeps track of the unreachability status for guards in state global to the system, so that repeated runs will not have to rediscover unreachability over and over again. However, this state does not need to be persisted permanently - it is purely an optimization.
The algorithm expects several arguments to guide its behavior. These will be defined in §2.1.
The goal of this algorithm is to strongly prefer connecting to the same guards we have connected to before, while also trying to detect conditions such as a network outage or a network environment that blocks most ports. The way it does this is by keeping track of how many guards we have exposed ourselves to, and if we have connected to too many we will fall back to only retrying the ones we have already tried. The algorithm also decides on sample set that should be persisted - in order to minimize the risk of an attacker forcing enumeration of the whole network by triggering rebuilding of circuits.
§2.1. The START algorithm
In order to start choosing an entry guard, use the START algorithm. This takes four arguments that can be used to fine tune the workings:
USED_GUARDS This is a list that contains all the guards that have been used before by this client. We will prioritize using guards from this list in order to minimize our exposure. The list is expected to be sorted based on priority, where the first entry will have the highest priority.
SAMPLED_UTOPIC_GUARDS This is a set that contains all guards that should be considered for connection under utopic conditions. This set should be persisted between runs. It will be filled in by the algorithm if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD guards after winnowing out older guards. It should be filled by using NEXT_BY_BANDWIDTH with UTOPIC_GUARDS as an argument.
SAMPLED_DYSTOPIC_GUARDS This is a set that contains all guards that should be considered for connection under dystopic conditions. This set should be persisted between runs. It will be filled in by the algorithm if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD guards after winnowing out older guards. It should be filled by using NEXT_BY_BANDWIDTH with DYSTOPIC_GUARDS as an argument.
EXCLUDE_NODES A set of nodes that we should not consider using as a guard.
N_PRIMARY_GUARDS The number of guards we should consider our primary guards. These guards will be retried more frequently and will take precedence in most situations. By default the primary guards will be the first N_PRIMARY_GUARDS guards from USED_GUARDS.
DIR If this argument is set, we should only consider guards that can be directory guards. If not set, we will consider all guards.
The primary work of START is to initialize the state machine depicted in §2.2. The initial state of the machine is defined by:
GUARDS This is a set of all guards from the consensus, without EXCLUDE_NODES and potentially filtered if DIR is set.
UTOPIC_GUARDS This is a set of all guards to use under utopic conditions. It will primarily be used to fill in SAMPLED_UTOPIC_GUARDS. This set will be initialized to be the same as GUARDS.
DYSTOPIC_GUARDS This is a set of all guards to use under dystopic conditions (usually when we are subject to a firewall that restricts the ports we can connect to). It will primarily be used to fill in SAMPLED_DYSTOPIC_GUARDS. This set will be initialized to be the subset of GUARDS that listen to ports that are allowed by dystopic conditions.
REMAINING_UTOPIC_GUARDS This is a running set of the utopic guards we have not yet tried to connect to. It should be initialized to be SAMPLED_UTOPIC_GUARDS without USED_GUARDS.
REMAINING_DYSTOPIC_GUARDS This is a running set of the dystopic guards we have not yet tried to connect to. It should be initialized to be SAMPLED_DYSTOPIC_GUARDS without USED_GUARDS.
STATE A variable that keeps track of which state in the state machine we are currently in. It should be initialized to STATE_PRIMARY_GUARDS.
PRIMARY_GUARDS This list keeps track of our primary guards. These are guards that we will prioritize when trying to connect, and will also retry more often in case of failure with other guards. It should be initialized by calling algorithm NEXT_PRIMARY_GUARD repeatedly until PRIMARY_GUARDS contains N_PRIMARY_GUARDS elements.
§2.2. The NEXT algorithm
The NEXT algorithm is composed of several different possibly flows. The first one is a simple state machine that can transfer between four different states. Every time NEXT is invoked, it will resume at the state where it left off previously. In the course of selecting an entry guard, a new consensus can arrive. When that happens we need to update the data structures used, but nothing else should change.
Before jumping in to the state machine, we should first check if it was at least PRIMARY_GUARDS_RETRY_INTERVAL minutes since we tried any of the PRIMARY_GUARDS. If this is the case, and we are not in STATE_PRIMARY_GUARDS, we should save the previous state and set the state to STATE_PRIMARY_GUARDS.
§2.2.1. The STATE_PRIMARY_GUARDS state
Return each entry in PRIMARY_GUARDS in turn. For each entry, if it was not possible to connect to it and mark the entry as unreachable [XXX defining "was not possible to connect" as "entry is not live" according to current definition of "live entry guard" in tor source code, seems to improve success rate on the flaky network scenario. See: https://github.com/twstrike/tor_guardsim/issues/1#issuecomment-187374942]
If all entries have been tried, restore the previous state and go there. If there is no previous state, transition to STATE_TRY_UTOPIC.
§2.2.2. The STATE_TRY_UTOPIC state
Return each entry in USED_GUARDS that is not in PRIMARY_GUARDS in turn. For each entry, if it was not possible to connect to it and mark the entry as unreachable.
Return each entry from REMAINING_UTOPIC_GUARDS using NEXT_BY_BANDWIDTH. For each entry, if it was not possible to connect to it, remove the entry from REMAINING_UTOPIC_GUARDS and mark it as unreachable.
If no entries remain in REMAINING_UTOPIC_GUARDS, transition to STATE_TRY_DYSTOPIC.
§2.2.3. The STATE_TRY_DYSTOPIC state
Return each entry from REMAINING_DYSTOPIC_GUARDS using NEXT_BY_BANDWIDTH. For each entry, if it was not possible to connect to it, remove the entry from REMAINING_DYSTOPIC_GUARDS and mark it as unreachable.
If no entries remain in REMAINING_DYSTOPIC_GUARDS, transition to STATE_PRIMARY_GUARDS.
§2.2.5. ON_NEW_CONSENSUS
First, ensure that all guard profiles are updated with information about whether they were in the newest consensus or not. If a guard is not included in the newest consensus, the guard is considered bad.
If any PRIMARY_GUARDS have become bad, remove the guard from PRIMARY_GUARDS. Then ensure that PRIMARY_GUARDS contain N_PRIMARY_GUARDS entries by repeatedly calling NEXT_PRIMARY_GUARD.
If any guards in USED_GUARDS have switched from being bad to being non-bad, add it back in the place it should have been in PRIMARY_GUARDS if it had been non-bad when populating PRIMARY_GUARDS. If this results in PRIMARY_GUARDS being larger than N_PRIMARY_GUARDS, truncate the list to be N_PRIMARY_GUARDS entries long.
§2.3. The SHOULD_CONTINUE algorithm
This algorithm takes as an argument a boolean indicating whether the circuit was successfully built or not.
After the caller have tried to build a circuit with a returned guard, they should invoke SHOULD_CONTINUE to understand if the algorithm is finished or not. SHOULD_CONTINUE will always return true if the circuit failed. If the circuit succeeded, SHOULD_CONTINUE will always return false, unless the guard that succeeded was the first guard to succeed after INTERNET_LIKELY_DOWN_INTERVAL minutes - in that case it will set the state to STATE_PRIMARY_GUARDS and return true.
§2.4. The END algorithm
The goal of this algorithm is simply to make sure that we keep track of successful connections made. This algorithm should be invoked with the guard that was used to correctly set up a circuit.
Once invoked, this algorithm will mark the guard as used, and make sure it is in USED_GUARDS, by adding it at the end if it was not there.
§2.5. Helper algorithms
These algorithms are used in the above algorithms, but have been separated out here in order to make the flow clearer.
NEXT_PRIMARY_GUARD - Return the first entry from USED_GUARDS that is not in PRIMARY_GUARDS and that is in the most recent consensus. - If USED_GUARDS is empty, use NEXT_BY_BANDWIDTH with REMAINING_UTOPIC_GUARDS as the argument.
NEXT_BY_BANDWIDTH - Takes G as an argument, which should be a set of guards to choose from. - Return a randomly select element from G, weighted by bandwidth.
§3. Consensus Parameters, & Configurable Variables
This proposal introduces several new parameters that ideally should be set in the consensus but that should also be possible to set or override in the client configuration file. Some of these have proposed values, but for others more simulation and trial needs to happen.
PRIMARY_GUARDS_RETRY_INTERVAL In order to make it more likely we connect to a primary guard, we would like to retry the primary guards more often than other types of guards. This parameter controls how many minutes should pass before we consider retrying primary guards again. The proposed value is 3.
SAMPLE_SET_THRESHOLD In order to allow us to recognize dystopic situations or a completely unreachable network, we would like to avoid connecting to too many guards before switching modes. We also want to avoid exposing ourselves to too many nodes in a potentially hostile situation. This parameter, expressed as a fraction, determines the number of guards we should keep as the sampled set of the only guards we will consider connecting to. It will be used as a fraction for both the utopic and the dystopic sampled set. If we assume there are 1900 utopic guards and of them there are 500 dystopic guards, a setting of 0.02 means we will have a sample set of 38 utopic guards and 10 dystopic guards. This limits our total exposure. Proposed value is 0.02.
INTERNET_LIKELY_DOWN_INTERVAL The number of minutes since we started trying to find an entry guard before we should consider the network down and consider retrying primary guards before using a functioning guard found. Proposed value 5.
§4. Security properties and behavior under various conditions
Under normal conditions, this algorithm will allow us to quickly connect and use guards we have used before with high likelihood of working. Assuming the first primary guard is reachable and in the consensus, this algorithm will deterministically always return that guard.
Under dystopic conditions (when a firewall is in place that blocks all ports except for potentially port 80 and 443), this algorithm will try to connect to 2% of all guards before switching modes to try dystopic guards. Currently, that means trying to connect to circa 40 guards before getting a successful connection. If we assume a connection try will take maximum 10 seconds, that means it will take up to 6 minutes to get a working connection.
When the network is completely down, we will try to connect to 2% of all guards plus 2% of all dystopic guards before realizing we are down. This means circa 50 guards tried assuming there are 1900 guards in the network.
In terms of exposure, we will connect to a maximum of 2% of all guards plus 2% of all dystopic guards, or 3% of all guards, whichever is lower. If N is the number of guards, and k is the number of guards an attacker controls, that means an attacker would have a probability of 1-(1-(k/N)^2)^(N * 0.03) to have one of their guards selected before we fall back. In real terms, this means an attacker would need to control over 10% of all guards in order to have a larger than 50% chance of controlling a guard for any given client.
In addition, since the sampled set changes slowly (the suggestion here is that guards in it expire every month) it is not possible for an attacker to force a connection to an entry guard that isn't already in the users sampled set.
§A. Appendix: An example usage
In order to clarify how this algorithm is supposed to be used, this pseudo code illustrates the building of a circuit:
OPEN_CIRCUIT: context = ALGO_CHOOSE_ENTRY_GUARD_START(used_guards, sampled_utopic_guards=[], sampled_dystopic_guards=[], exclude_nodes=[], n_primary_guards=3, dir=false, guards_in_consensus)
while True: entryGuard = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context) circuit = composeCircuitAndConnect(entryGuard)
if not SHOULD_CONTINUE(isSuccessful(circuit)): ALGO_CHOOSE_ENTRY_GUARD_END(context, entryGuard) return circuit
-*- coding: utf-8 -*-
On 24 Mar 2016, at 22:55, George Kadianakis desnacked@riseup.net wrote:
- How exactly should directory guards work? Should the guard lists be
initialized only with directory guards? Or should we just initialize our guard lists from the set of all guards, and just skip non-V2Dir guards whenever we need to make a directory circuit? FWIW, there are currently 2076 guards, out of which 1659 support V2Dir (i.e. they are directory guards).
The number of directory guards will increase when 0.2.8-stable is released and relays and clients upgrade. In 0.2.8, relays accept tunnelled directory connections even if they do not have an open DirPort.
- How should the algorithm work wrt ReachableAddresses? I wonder how the
current algorithm work wrt ReachableAddresses and if it's behavior is good.
In 0.2.7 and earlier, tor checks directory guards against ReachableAddresses and remove those that aren't reachable, but it doesn't check non-directory guards in choose_good_entry_server.
In 0.2.8 this issue is fixed, and tor checks both directory and non-directory guards against ReachableAddresses. For both types of guards, tor also checks if the guard's address is preferred by ClientPreferIPv6{OR,Dir}Port, and choose guards with preferred IP address versions before choosing guards without preferred address versions. Since all relays have IPv4 addresses, when IPv6 addresses are preferred, tor will choose guards with IPv6 OR and Dir addresses.
I am worried that the UTOPIC / DYSTOPIC distinction is too simplistic to capture the existing guard selection behaviour in 0.2.7 with ReachableAddresses. This becomes even more complex once IPv6 preferences are taken into account during guard selection in 0.2.8.
General proposal feedback:
The proposal would be much clearer if DYSTOPIC_GUARDS was defined precisely. Are they guards that have a DirPort 80 and ORPort 443? Or can these ports be swapped? (DirPort 443 and ORPort 80?)
What about ReachableAddresses? What about IPv6 ORPorts and DirPorts?
Automatically determining if a client has IPv6 connectivity may be beyond the scope of this proposal, and might require a separate proposal. But we should be careful not to bake in a design that we'll want to change in a release or two when we auto-configure IPv6.
Also, whatever we code will need to preserve the existing behaviour of the options ReachableAddresses, ClientUseIPv4, CientUseIPv6, ClientPreferIPv6ORPort, and ClientPreferIPv6DirPort.
Feedback on specific sections:
Under dystopic conditions (when a firewall is in place that blocks all ports except for potentially port 80 and 443), this algorithm will try to connect to 2% of all guards before switching modes to try dystopic guards. Currently, that means trying to connect to circa 40 guards before getting a successful connection. If we assume a connection try will take maximum 10 seconds, that means it will take up to 6 minutes to get a working connection.
This seems far too long for most users. Usability studies have demonstrated that users give up after approximately 30 seconds.
Can we design an algorithm that will automatically choose a dystopic guard and bootstrap within 30 seconds? What are the security tradeoffs if we do?
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
Tim Wilson-Brown - teor teor2345@gmail.com writes:
[ text/plain ]
On 24 Mar 2016, at 22:55, George Kadianakis desnacked@riseup.net wrote:
- How exactly should directory guards work? Should the guard lists be
initialized only with directory guards? Or should we just initialize our guard lists from the set of all guards, and just skip non-V2Dir guards whenever we need to make a directory circuit? FWIW, there are currently 2076 guards, out of which 1659 support V2Dir (i.e. they are directory guards).
The number of directory guards will increase when 0.2.8-stable is released and relays and clients upgrade. In 0.2.8, relays accept tunnelled directory connections even if they do not have an open DirPort.
Indeed, soon enough all guards will be directory guards.
- How should the algorithm work wrt ReachableAddresses? I wonder how the
current algorithm work wrt ReachableAddresses and if it's behavior is good.
In 0.2.7 and earlier, tor checks directory guards against ReachableAddresses and remove those that aren't reachable, but it doesn't check non-directory guards in choose_good_entry_server.
In 0.2.8 this issue is fixed, and tor checks both directory and non-directory guards against ReachableAddresses. For both types of guards, tor also checks if the guard's address is preferred by ClientPreferIPv6{OR,Dir}Port, and choose guards with preferred IP address versions before choosing guards without preferred address versions. Since all relays have IPv4 addresses, when IPv6 addresses are preferred, tor will choose guards with IPv6 OR and Dir addresses.
I am worried that the UTOPIC / DYSTOPIC distinction is too simplistic to capture the existing guard selection behaviour in 0.2.7 with ReachableAddresses. This becomes even more complex once IPv6 preferences are taken into account during guard selection in 0.2.8.
Hm. I'm not sure how to do this properly right now. I feel that the current Tor behavior is very very complex, and it's hard to incorporate all that logic in a proposal that tries to be standalone and easy to analyze.
We need to document the current behavior of guards wrt ReachableAddresses and IPv6, and see which parts we want to keep as is and which ones we want to change.
General proposal feedback:
The proposal would be much clearer if DYSTOPIC_GUARDS was defined precisely. Are they guards that have a DirPort 80 and ORPort 443? Or can these ports be swapped? (DirPort 443 and ORPort 80?)
Indeed, pinning down the concept of DYSTOPIC_GUARDS in the proposal seems like a good idea.
I think the current idea is that they are guards that have their ORPort on ports 80 or 443. We don't care about the DirPort I think, since directory requests happen over the ORPort with BEGIN_DIR. Is that right?
I think Reinaldo et al. were also thinking of incorporating the ReachableAddresses logic in there, so that DYSTOPIC_GUARDS changes based on the reachability settings of the client. I'm not sure exactly how that would work, especially when the user can change ReachableAddresses at any moment. I think we should go for the simplest thing possible here, and improve our heuristics in the future based on testing.
What about ReachableAddresses? What about IPv6 ORPorts and DirPorts?
Automatically determining if a client has IPv6 connectivity may be beyond the scope of this proposal, and might require a separate proposal. But we should be careful not to bake in a design that we'll want to change in a release or two when we auto-configure IPv6.
Also, whatever we code will need to preserve the existing behaviour of the options ReachableAddresses, ClientUseIPv4, CientUseIPv6, ClientPreferIPv6ORPort, and ClientPreferIPv6DirPort.
Indeed. We need to document how these settings interact with guard selection currently, and see what needs to be done.
Feedback on specific sections:
Under dystopic conditions (when a firewall is in place that blocks all ports except for potentially port 80 and 443), this algorithm will try to connect to 2% of all guards before switching modes to try dystopic guards. Currently, that means trying to connect to circa 40 guards before getting a successful connection. If we assume a connection try will take maximum 10 seconds, that means it will take up to 6 minutes to get a working connection.
This seems far too long for most users. Usability studies have demonstrated that users give up after approximately 30 seconds.
Can we design an algorithm that will automatically choose a dystopic guard and bootstrap within 30 seconds? What are the security tradeoffs if we do?
OK, let's assume that a connection failed timeout might take up to 10 seconds.
If Alice is behind a FascistFirewall and we want her to bootstrap within 30 seconds, this means that she always needs to have an 80/443 guard in her top three choices. This means, that we would heavily prioritize 80/443 guards over the rest, and an adversary who sets up 80/443 guards will attract more clients.
I think the current proposal tries to balance this, by enabling this heuristic only after Alice exhausts her utopic guardlist. Also, keep in mind that the utopic guardlist might contain 80/443 guards as well. So if Alice is lucky, she got an 80/443 guard in her utopic guard list, and she will still bootstrap before the dystopic heuristic triggers.
There are various ways to make this heuristic more "intelligent", but I would like to maintain simplicity in our design (both simple to understand and to implement). For example, we could imagine that we always put some 80/443 guards as our primary guards, or in the utopic guardlist. Or, that we reduce the 2% requirement so that we go trigger the dystopic heuristic faster.
Currently, I'm hoping that we will understand the value of this heuristic better when we implement it, and test it on real networks...
Any suggestions?
On 25 Mar 2016, at 00:31, George Kadianakis desnacked@riseup.net wrote:
Tim Wilson-Brown - teor <teor2345@gmail.com mailto:teor2345@gmail.com> writes:
[ text/plain ]
On 24 Mar 2016, at 22:55, George Kadianakis <desnacked@riseup.net mailto:desnacked@riseup.net> wrote:
- How exactly should directory guards work? Should the guard lists be
initialized only with directory guards? Or should we just initialize our guard lists from the set of all guards, and just skip non-V2Dir guards whenever we need to make a directory circuit? FWIW, there are currently 2076 guards, out of which 1659 support V2Dir (i.e. they are directory guards).
The number of directory guards will increase when 0.2.8-stable is released and relays and clients upgrade. In 0.2.8, relays accept tunnelled directory connections even if they do not have an open DirPort.
Indeed, soon enough all guards will be directory guards.
Almost all guards will be directory guards. AccountingMax can disable tunnelled directory fetches, as can DirCache 0.
General proposal feedback:
The proposal would be much clearer if DYSTOPIC_GUARDS was defined precisely. Are they guards that have a DirPort 80 and ORPort 443? Or can these ports be swapped? (DirPort 443 and ORPort 80?)
Indeed, pinning down the concept of DYSTOPIC_GUARDS in the proposal seems like a good idea.
I think the current idea is that they are guards that have their ORPort on ports 80 or 443. We don't care about the DirPort I think, since directory requests happen over the ORPort with BEGIN_DIR. Is that right?
Almost all clients tunnel connections over the ORPort, some obscure configs use the DirPort. (We want to fix this in #18483.) Relays fetch directory documents over HTTP.
I think Reinaldo et al. were also thinking of incorporating the ReachableAddresses logic in there, so that DYSTOPIC_GUARDS changes based on the reachability settings of the client. I'm not sure exactly how that would work, especially when the user can change ReachableAddresses at any moment. I think we should go for the simplest thing possible here, and improve our heuristics in the future based on testing.
I suggest that we compose the set of UTOPIC guards based on addresses that are reachable and preferred (or, if there are no guards with preferred addresses, those guards that are reachable). I suggest that we use the same mechanism with DYSTOPIC guards, but add a port restriction to 80 & 443 to all the other restrictions. (This may result in the empty set.)
We already accept that that set of guards can change when the consensus changes (they can disappear, or lose the guard flag). Therefore, it seems trivial to also allow guard set changes when ReachableAddresses or similar options change.
Feedback on specific sections:
Under dystopic conditions (when a firewall is in place that blocks all ports except for potentially port 80 and 443), this algorithm will try to connect to 2% of all guards before switching modes to try dystopic guards. Currently, that means trying to connect to circa 40 guards before getting a successful connection. If we assume a connection try will take maximum 10 seconds, that means it will take up to 6 minutes to get a working connection.
This seems far too long for most users. Usability studies have demonstrated that users give up after approximately 30 seconds.
Can we design an algorithm that will automatically choose a dystopic guard and bootstrap within 30 seconds? What are the security tradeoffs if we do?
OK, let's assume that a connection failed timeout might take up to 10 seconds.
If Alice is behind a FascistFirewall and we want her to bootstrap within 30 seconds, this means that she always needs to have an 80/443 guard in her top three choices. This means, that we would heavily prioritize 80/443 guards over the rest, and an adversary who sets up 80/443 guards will attract more clients.
This isn't how Tor works - it tries multiple guards simultaneously. (See below for details.) Can we rework this calculation to take that into account?
I think the current proposal tries to balance this, by enabling this heuristic only after Alice exhausts her utopic guardlist. Also, keep in mind that the utopic guardlist might contain 80/443 guards as well. So if Alice is lucky, she got an 80/443 guard in her utopic guard list, and she will still bootstrap before the dystopic heuristic triggers.
There are various ways to make this heuristic more "intelligent", but I would like to maintain simplicity in our design (both simple to understand and to implement). For example, we could imagine that we always put some 80/443 guards as our primary guards, or in the utopic guardlist. Or, that we reduce the 2% requirement so that we go trigger the dystopic heuristic faster.
Or that tor can get a hint about which ports it can access based on which ports it used to bootstrap. (See below for details.)
Currently, I'm hoping that we will understand the value of this heuristic better when we implement it, and test it on real networks...
Any suggestions?
There's a whole lot of my thoughts below.
What are we protecting against?
I'm not convinced that "failing to connect to a guard" is much of an anonymity issue for clients. It does provide a unique fingerprint to the censor, and the more guards we try from a pre-selected list, the more unique that fingerprint is. But if client packets never get to the guard, then the client can't even be identified by the guard.
Why such a large list of guards?
Apart from the fingerprinting issue (which I think gets worse with a larger list, at least if it's tried in order), I wonder why we bother trying such a large UTOPIC guardlist. Surely after you've tried 10 guards, the chances that the 11th is going to connect is vanishingly small. (Unless it's on a different port or netback, I guess.) And if our packets are reaching the guard, and being dropped on the way back, we have to consider the load this places on the network.
Client Bootstrap
The proposal ignores client bootstrap.
There are a limited number of hard-coded authorities and fallback directories available during client bootstrap. The client doesn't select guards until it has bootstrapped from one of the 9 authorities or 20-200 fallback directories.
Bootstrap / Launch Time
The proposal calculates bootstrap and launch time incorrectly.
The proposal assumes that Tor attempts to connect to each guard, waits for failure before trying another. But this isn't how Tor actually works - it sometimes tries multiple connections simultaneously. So summing the times for individual connection attempts to each guard doesn't provide an accurate picture of the actual connection time.
When bootstrapping in 0.2.7 and earlier, tor will try an authority, wait up to 10 seconds for it to fail, then try another. Then there's a 60 second wait before the third authority, but at that point the user has likely lost interest.
In 0.2.8, tor connects to authorities and fallbacks concurrently. It will try 3 fallbacks and 1 authority in the first 10 seconds, and download from whichever one connects first So 0.2.8 is far more likely to connect within a few seconds.
In all current versions, tor then downloads the consensus (~1.5MB, could take 10 seconds or more), and chooses directory guards. Then it simultaneously connects to 3 directory guards to download certificates and descriptors. The time it takes tor to work out if a connection to a directory guard has succeeded happens simultaneously with other directory guard timeouts.
So under this proposal, it would really take tor: 10 seconds for initial bootstrap 20 seconds (or more) to download the consensus 600 seconds / 3 directory guards = 200 seconds to exhaust its UTOPIC guardlist (tor skip the first two phases if it has a live consensus)
Can we revise the proposal to take this into account?
Other Considerations
We're considering increasing the 10 second stream attach timeout to support users on slow and unreliable network connections (#16844). We should think about the impact of that on this proposal - I'd hate to double the time it takes tor to exhaust its UTOPIC guardlist.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
Tim Wilson-Brown - teor teor2345@gmail.com writes:
[ text/plain ]
On 25 Mar 2016, at 00:31, George Kadianakis desnacked@riseup.net wrote:
Tim Wilson-Brown - teor <teor2345@gmail.com mailto:teor2345@gmail.com> writes:
[ text/plain ]
On 24 Mar 2016, at 22:55, George Kadianakis <desnacked@riseup.net mailto:desnacked@riseup.net> wrote:
<snip>
I think Reinaldo et al. were also thinking of incorporating the ReachableAddresses logic in there, so that DYSTOPIC_GUARDS changes based on the reachability settings of the client. I'm not sure exactly how that would work, especially when the user can change ReachableAddresses at any moment. I think we should go for the simplest thing possible here, and improve our heuristics in the future based on testing.
I suggest that we compose the set of UTOPIC guards based on addresses that are reachable and preferred (or, if there are no guards with preferred addresses, those guards that are reachable). I suggest that we use the same mechanism with DYSTOPIC guards, but add a port restriction to 80 & 443 to all the other restrictions. (This may result in the empty set.)
Alright, this seems like a good process here. We should do it like that.
What happens if a utopic guard suddenly is not included in the ReachableAddresses anymore? Maybe we mark it as 'bad' (the same way we mark relays that leave the consensus).
<snip>
I think the current proposal tries to balance this, by enabling this heuristic only after Alice exhausts her utopic guardlist. Also, keep in mind that the utopic guardlist might contain 80/443 guards as well. So if Alice is lucky, she got an 80/443 guard in her utopic guard list, and she will still bootstrap before the dystopic heuristic triggers.
There are various ways to make this heuristic more "intelligent", but I would like to maintain simplicity in our design (both simple to understand and to implement). For example, we could imagine that we always put some 80/443 guards as our primary guards, or in the utopic guardlist. Or, that we reduce the 2% requirement so that we go trigger the dystopic heuristic faster.
Or that tor can get a hint about which ports it can access based on which ports it used to bootstrap. (See below for details.)
Yes, could be.
How would that work though? And what happens if the network changes? How does the hint work then though?
Currently, I'm hoping that we will understand the value of this heuristic better when we implement it, and test it on real networks...
Any suggestions?
There's a whole lot of my thoughts below.
Why such a large list of guards?
Apart from the fingerprinting issue (which I think gets worse with a larger list, at least if it's tried in order), I wonder why we bother trying such a large UTOPIC guardlist. Surely after you've tried 10 guards, the chances that the 11th is going to connect is vanishingly small. (Unless it's on a different port or netback, I guess.) And if our packets are reaching the guard, and being dropped on the way back, we have to consider the load this places on the network.
Indeed, I also feel that 80 guards is a lot of guards to try before switching to dystopic mode.
I would be up for reducing it. I wonder what's the right number here.
My fear with having a small number of sampled guards in a guardlist is that if all of them go down at the same time, then that guardlist is useless.
Also, this reminds me that the proposal does not precisely specify what happens when guards in SAMPLED_UTOPIC_GUARDS become bad (they drop out of the consensus). Do we keep them on the list but marked as bad? What happens if lots of them become bad? When do we add new guards? Currently the proposal only says:
It will be filled in by the algorithm if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD guards after winnowing out older guards. It should be filled by using NEXT_BY_BANDWIDTH with UTOPIC_GUARDS as an argument.
I think we should be more specific here.
Client Bootstrap
The proposal ignores client bootstrap.
There are a limited number of hard-coded authorities and fallback directories available during client bootstrap. The client doesn't select guards until it has bootstrapped from one of the 9 authorities or 20-200 fallback directories.
What do you think should be mentioned here?
Bootstrap / Launch Time
The proposal calculates bootstrap and launch time incorrectly.
The proposal assumes that Tor attempts to connect to each guard, waits for failure before trying another. But this isn't how Tor actually works - it sometimes tries multiple connections simultaneously. So summing the times for individual connection attempts to each guard doesn't provide an accurate picture of the actual connection time.
When bootstrapping in 0.2.7 and earlier, tor will try an authority, wait up to 10 seconds for it to fail, then try another. Then there's a 60 second wait before the third authority, but at that point the user has likely lost interest.
In 0.2.8, tor connects to authorities and fallbacks concurrently. It will try 3 fallbacks and 1 authority in the first 10 seconds, and download from whichever one connects first So 0.2.8 is far more likely to connect within a few seconds.
In all current versions, tor then downloads the consensus (~1.5MB, could take 10 seconds or more), and chooses directory guards. Then it simultaneously connects to 3 directory guards to download certificates and descriptors. The time it takes tor to work out if a connection to a directory guard has succeeded happens simultaneously with other directory guard timeouts.
So under this proposal, it would really take tor: 10 seconds for initial bootstrap 20 seconds (or more) to download the consensus 600 seconds / 3 directory guards = 200 seconds to exhaust its UTOPIC guardlist
Where does the "600 seconds" figure come from here?
(tor skip the first two phases if it has a live consensus)
Can we revise the proposal to take this into account?
Are you talking about section 4? Yes, that could be rewritten a bit.
However, I think that section does not specifically talk about bootstrap as you seem to be doing.
So, if you have Tor running and you move your laptop to a network with FascistFirewall, you will not be bootstrapping again with 3 directory guards. Instead, you are going to be walking over the guard list with a single guard. So in that case section 4 will be more accurate.
Or am I wrong?
On 25 Mar 2016, at 22:26, George Kadianakis desnacked@riseup.net wrote:
Tim Wilson-Brown - teor teor2345@gmail.com writes:
[ text/plain ]
On 25 Mar 2016, at 00:31, George Kadianakis desnacked@riseup.net wrote:
Tim Wilson-Brown - teor <teor2345@gmail.com mailto:teor2345@gmail.com> writes:
[ text/plain ]
On 24 Mar 2016, at 22:55, George Kadianakis <desnacked@riseup.net mailto:desnacked@riseup.net> wrote:
<snip>
I think Reinaldo et al. were also thinking of incorporating the ReachableAddresses logic in there, so that DYSTOPIC_GUARDS changes based on the reachability settings of the client. I'm not sure exactly how that would work, especially when the user can change ReachableAddresses at any moment. I think we should go for the simplest thing possible here, and improve our heuristics in the future based on testing.
I suggest that we compose the set of UTOPIC guards based on addresses that are reachable and preferred (or, if there are no guards with preferred addresses, those guards that are reachable). I suggest that we use the same mechanism with DYSTOPIC guards, but add a port restriction to 80 & 443 to all the other restrictions. (This may result in the empty set.)
Alright, this seems like a good process here. We should do it like that.
What happens if a utopic guard suddenly is not included in the ReachableAddresses anymore? Maybe we mark it as 'bad' (the same way we mark relays that leave the consensus).
Yes, if it's not available, it doesn't really matter why. (Having different behaviours for different reasons would complicate guard selection.)
<snip>
I think the current proposal tries to balance this, by enabling this heuristic only after Alice exhausts her utopic guardlist. Also, keep in mind that the utopic guardlist might contain 80/443 guards as well. So if Alice is lucky, she got an 80/443 guard in her utopic guard list, and she will still bootstrap before the dystopic heuristic triggers.
There are various ways to make this heuristic more "intelligent", but I would like to maintain simplicity in our design (both simple to understand and to implement). For example, we could imagine that we always put some 80/443 guards as our primary guards, or in the utopic guardlist. Or, that we reduce the 2% requirement so that we go trigger the dystopic heuristic faster.
Or that tor can get a hint about which ports it can access based on which ports it used to bootstrap. (See below for details.)
Yes, could be.
How would that work though?
We pass the port(s) that we've successfully bootstrapped on to the guard selection algorithm as an initial hint. The algorithm ensures than X% of the relays it selects are on those port(s).
The problem with this approach is that it biases guard selection towards the DirPorts that authorities and fallback directories are on. So X% must be high enough to ensure we can continue to load descriptors if all other ports are blocked, but low enough not to overload guards on those ports.
And what happens if the network changes? How does the hint work then though?
There are two scenarios: If the network changes after a short period of downtime (<24 hours), the consensus will still be current, and we won't bootstrap again. Some of our guards will fail, and we will choose other guards from the original list.
If the network changes after a long period of downtime (>=24 hours), the consensus will expire, and we will bootstrap again, and get a new hint. We will check if the original list contains Y% guards on these new ports (where Y <= X). If it doesn't, we can augment the list with new guards on those ports, or create an entirely new list.
There is a risk here that the list grows without bound if Y% is high, and we regularly switch between N sites that each allow a small number of different ports.
Currently, I'm hoping that we will understand the value of this heuristic better when we implement it, and test it on real networks...
Any suggestions?
There's a whole lot of my thoughts below.
Why such a large list of guards?
Apart from the fingerprinting issue (which I think gets worse with a larger list, at least if it's tried in order), I wonder why we bother trying such a large UTOPIC guardlist. Surely after you've tried 10 guards, the chances that the 11th is going to connect is vanishingly small. (Unless it's on a different port or netback, I guess.) And if our packets are reaching the guard, and being dropped on the way back, we have to consider the load this places on the network.
Indeed, I also feel that 80 guards is a lot of guards to try before switching to dystopic mode.
I would be up for reducing it. I wonder what's the right number here.
My fear with having a small number of sampled guards in a guardlist is that if all of them go down at the same time, then that guardlist is useless.
I would imagine that the probability of 10+ guards going down at the same time is minuscule, unless the network has major issues, or a port is blocked on the client side.
Also, this reminds me that the proposal does not precisely specify what happens when guards in SAMPLED_UTOPIC_GUARDS become bad (they drop out of the consensus). Do we keep them on the list but marked as bad? What happens if lots of them become bad? When do we add new guards? Currently the proposal only says:
It will be filled in by the algorithm if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD guards after winnowing out older guards. It should be filled by using NEXT_BY_BANDWIDTH with UTOPIC_GUARDS as an argument.
I think we should be more specific here.
Yes, we need to behave sensibly if lots of guards become bad. It's most likely that we're effectively blocked from Tor. Or that we can only use a small number of ports to get out. In this case, it would help to add guards with known good ports (from a recent bootstrap hint).
Client Bootstrap
The proposal ignores client bootstrap.
There are a limited number of hard-coded authorities and fallback directories available during client bootstrap. The client doesn't select guards until it has bootstrapped from one of the 9 authorities or 20-200 fallback directories.
What do you think should be mentioned here?
That clients bootstrap before selecting guards. That loading a consensus takes additional time during initial bootstrap (5-30s?) that's not counted in these calculations.
Bootstrap / Launch Time
The proposal calculates bootstrap and launch time incorrectly.
The proposal assumes that Tor attempts to connect to each guard, waits for failure before trying another. But this isn't how Tor actually works - it sometimes tries multiple connections simultaneously. So summing the times for individual connection attempts to each guard doesn't provide an accurate picture of the actual connection time.
When bootstrapping in 0.2.7 and earlier, tor will try an authority, wait up to 10 seconds for it to fail, then try another. Then there's a 60 second wait before the third authority, but at that point the user has likely lost interest.
In 0.2.8, tor connects to authorities and fallbacks concurrently. It will try 3 fallbacks and 1 authority in the first 10 seconds, and download from whichever one connects first So 0.2.8 is far more likely to connect within a few seconds.
In all current versions, tor then downloads the consensus (~1.5MB, could take 10 seconds or more), and chooses directory guards. Then it simultaneously connects to 3 directory guards to download certificates and descriptors. The time it takes tor to work out if a connection to a directory guard has succeeded happens simultaneously with other directory guard timeouts.
So under this proposal, it would really take tor: 10 seconds for initial bootstrap 20 seconds (or more) to download the consensus 600 seconds / 3 directory guards = 200 seconds to exhaust its UTOPIC guardlist
Where does the "600 seconds" figure come from here?
It's the existing figure in the proposal. 10 seconds x 60 guards in the UTOPIC list. Although if tor is building preemptive paths at the same time, the calculation could well be:
600 seconds / (3 directory guards + 1 OR guard) = 150 seconds to exhaust its UTOPIC guardlist
(tor skip the first two phases if it has a live consensus)
Can we revise the proposal to take this into account?
Are you talking about section 4? Yes, that could be rewritten a bit.
However, I think that section does not specifically talk about bootstrap as you seem to be doing.
So, if you have Tor running and you move your laptop to a network with FascistFirewall, you will not be bootstrapping again with 3 directory guards. Instead, you are going to be walking over the guard list with a single guard. So in that case section 4 will be more accurate.
Or am I wrong?
I think it depends.
Tor will use its directory guards to download new descriptors on a regular basis. Tor will use its OR guard to connect to the Tor network and build preemptive paths on a regular basis. This all happens at the same time.
So I think it depends how many updated descriptors there are in each consensus. (And it depends exactly when you download a new consensus.)
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
Hello,
teor, asn, see comments inline.
On 3/24/2016 5:00 PM, Tim Wilson-Brown - teor wrote: [snip]
The number of directory guards will increase when 0.2.8-stable is released and relays and clients upgrade. In 0.2.8, relays accept tunnelled directory connections even if they do not have an open DirPort.
Indeed, soon enough all guards will be directory guards.
Almost all guards will be directory guards. AccountingMax can disable tunnelled directory fetches, as can DirCache 0.
I guess the guards that won't be accepting tunneled BEGIN_DIR connections because of AccountingMax or DirCache 0 will also advertise this in their descriptors, so these relays will not get a `V2Dir` flag. Can you confirm if this is actually true? I assume the code has to do this, otherwise how can a client know if he can initiate a tunneled BEGIN_DIR connection with a relay or not.
Simplest thing is to make the guard also be the directory guard. As Roger suggested the "Notes from the prop259 proposal reading group" thread, we might make the authorities assign the 'Guard' flag only to relays that also have the 'V2Dir' flag, among the other existing requirements. Until then, we require a DirPort set to be Guard, but this will add extra complexity (what if ORPort is in 443 and DirPort on 9030 - is this guard utopic or dystopic?).
[snip] Feedback on specific sections:
Under dystopic conditions (when a firewall is in place that blocks all ports except for potentially port 80 and 443), this algorithm will try to connect to 2% of all guards before switching modes to try dystopic guards. Currently, that means trying to connect to circa 40 guards before getting a successful connection. If we assume a connection try will take maximum 10 seconds, that means it will take up to 6 minutes to get a working connection.
This seems far too long for most users. Usability studies have demonstrated that users give up after approximately 30 seconds.
Can we design an algorithm that will automatically choose a dystopic guard and bootstrap within 30 seconds? What are the security tradeoffs if we do?
OK, let's assume that a connection failed timeout might take up to 10 seconds.
If Alice is behind a FascistFirewall and we want her to bootstrap within 30 seconds, this means that she always needs to have an 80/443 guard in her top three choices. This means, that we would heavily prioritize 80/443 guards over the rest, and an adversary who sets up 80/443 guards will attract more clients.
This isn't how Tor works - it tries multiple guards simultaneously. (See below for details.) Can we rework this calculation to take that into account?
I think the current proposal tries to balance this, by enabling this heuristic only after Alice exhausts her utopic guardlist. Also, keep in mind that the utopic guardlist might contain 80/443 guards as well. So if Alice is lucky, she got an 80/443 guard in her utopic guard list, and she will still bootstrap before the dystopic heuristic triggers.
There are various ways to make this heuristic more "intelligent", but I would like to maintain simplicity in our design (both simple to understand and to implement). For example, we could imagine that we always put some 80/443 guards as our primary guards, or in the utopic guardlist. Or, that we reduce the 2% requirement so that we go trigger the dystopic heuristic faster.
I agree that the maximum total time at client side to get a working connection is probably too much. However, I am thinking asn's arguments about _ensuring_ we keep at least n dystopic guards in our PRIMARY_GUARDS list: a) overloading 80/443 (dystopic guards); b) creating incentives for attackers to run 80/443 (dystopic guards) that will give them unfair probabilities to be picked by clients;
are very important and could be worth the effort to make this tradeoff and increase the maximum possible time to get a working connection at client side.
As I understand, the utopic guard list _can_ also contain dystopic guards, so a client behind a FascistFirewall might be lucky and don't have to wait until utopic guard list is exhausted entirely. This is better, but I still think it would be simpler if instead of 2 guard lists: - SAMPLED_UTOPIC_GUARDS - SAMPLED_DYSTOPIC_GUARDS
We create a single SAMPLED_GUARDS list, but we make the selection by taking into account the ratio of utopic and dystopic guards based on their weights from the last consensus. I have suggested a simple example for this few months ago in this post:
https://lists.torproject.org/pipermail/tor-dev/2015-November/009871.html
If we compute the guard list like this, load balancing shouldn't be affected in any way (we use the weights to build the list, not the number of relays). I saw the algorithm has been improved so much and covers so many aspects we didn't consider initially, but I still don't understand why we need two separate lists of utopic guards and dystopic guards when we can create a single list.
This will also allow us to safely decrease a little bit the total number of guards we are willing to try, being sure that clients behind FascistFirewalls get a chance while also taking into account teor's concern not to make this list too big.
[snip] Client Bootstrap
The proposal ignores client bootstrap.
There are a limited number of hard-coded authorities and fallback directories available during client bootstrap. The client doesn't select guards until it has bootstrapped from one of the 9 authorities or 20-200 fallback directories.
I think this step is before prop#259 does its magic, since prop#259 first needs a consensus before it can work. Let's call this initial (genesis) bootstrap Step 0 - only after a client has bootstrapped (either from an authority or from a fallback directory) he will initiate prop#259 to pick a guard.
Bootstrap / Launch Time
The proposal calculates bootstrap and launch time incorrectly.
The proposal assumes that Tor attempts to connect to each guard, waits for failure before trying another. But this isn't how Tor actually works
- it sometimes tries multiple connections simultaneously. So summing the
times for individual connection attempts to each guard doesn't provide an accurate picture of the actual connection time.
When bootstrapping in 0.2.7 and earlier, tor will try an authority, wait up to 10 seconds for it to fail, then try another. Then there's a 60 second wait before the third authority, but at that point the user has likely lost interest.
In 0.2.8, tor connects to authorities and fallbacks concurrently. It will try 3 fallbacks and 1 authority in the first 10 seconds, and download from whichever one connects first So 0.2.8 is far more likely to connect within a few seconds.
In all current versions, tor then downloads the consensus (~1.5MB, could take 10 seconds or more), and chooses directory guards. Then it simultaneously connects to 3 directory guards to download certificates and descriptors. The time it takes tor to work out if a connection to a directory guard has succeeded happens simultaneously with other directory guard timeouts.
Hmm. This requires some thinking. So Tor connects to the directory guards immediately after it gets a consensus, to get the certificates and descriptors. Plausible. I assume it does this via HTTP fetch on the DirPort, since it has _no_ certificates and descriptors for routers. Doesn't Tor need these certificates and descriptors to initiate tunneled BEGIN_DIR requests with certain relays?
How will this work once DirPort is deprecated entirely? Or removing DirPort from relays is not part of the plan?
Under prop#259 we can for example, initiate 3 simultaneous tunneled BEGIN_DIR connections with the top 3 guards in PRIMARY_GUARDS to fetch the certificates and descriptors. Until all relays update to recent enough Tor versions, we initiate 3 HTTP GET connections to DirPorts with the top 3 guards in PRIMARY_GUARDS. This shouldn't affect client's anonymity or expose him too much if it's just for the certificates and descriptors on one side, and on another side after all a client is at all the time exposed to guards in PRIMARY_GUARDS list, for usability, performance, overloading of some guards, etc.
Other Considerations
We're considering increasing the 10 second stream attach timeout to support users on slow and unreliable network connections (#16844). We should think about the impact of that on this proposal - I'd hate to double the time it takes tor to exhaust its UTOPIC guardlist.
This is correct.
Also, FascistFirewall torrc option: prop#259 sounds like it will take care of users behind FascistFirewalls by default, should we eliminate it entirely for simplicity? Or should we make it that FascistFirewall 1 will tell prop#259 to populate SAMPLED_GUARDS list only with dystopic guards OR use only a SAMPLED_DYSTOPIC_GUARDS list if we choose to keep the two lists disjoint?
s7r s7r@sky-ip.org writes:
[ text/plain ] Hello,
teor, asn, see comments inline.
On 3/24/2016 5:00 PM, Tim Wilson-Brown - teor wrote: [snip]
[snip]
This isn't how Tor works - it tries multiple guards simultaneously. (See below for details.) Can we rework this calculation to take that into account?
I think the current proposal tries to balance this, by enabling this heuristic only after Alice exhausts her utopic guardlist. Also, keep in mind that the utopic guardlist might contain 80/443 guards as well. So if Alice is lucky, she got an 80/443 guard in her utopic guard list, and she will still bootstrap before the dystopic heuristic triggers.
There are various ways to make this heuristic more "intelligent", but I would like to maintain simplicity in our design (both simple to understand and to implement). For example, we could imagine that we always put some 80/443 guards as our primary guards, or in the utopic guardlist. Or, that we reduce the 2% requirement so that we go trigger the dystopic heuristic faster.
I agree that the maximum total time at client side to get a working connection is probably too much. However, I am thinking asn's arguments about _ensuring_ we keep at least n dystopic guards in our PRIMARY_GUARDS list: a) overloading 80/443 (dystopic guards); b) creating incentives for attackers to run 80/443 (dystopic guards) that will give them unfair probabilities to be picked by clients;
are very important and could be worth the effort to make this tradeoff and increase the maximum possible time to get a working connection at client side.
As I understand, the utopic guard list _can_ also contain dystopic guards, so a client behind a FascistFirewall might be lucky and don't have to wait until utopic guard list is exhausted entirely. This is better, but I still think it would be simpler if instead of 2 guard lists:
- SAMPLED_UTOPIC_GUARDS
- SAMPLED_DYSTOPIC_GUARDS
We create a single SAMPLED_GUARDS list, but we make the selection by taking into account the ratio of utopic and dystopic guards based on their weights from the last consensus. I have suggested a simple example for this few months ago in this post:
https://lists.torproject.org/pipermail/tor-dev/2015-November/009871.html
If we compute the guard list like this, load balancing shouldn't be affected in any way (we use the weights to build the list, not the number of relays). I saw the algorithm has been improved so much and covers so many aspects we didn't consider initially, but I still don't understand why we need two separate lists of utopic guards and dystopic guards when we can create a single list.
This will also allow us to safely decrease a little bit the total number of guards we are willing to try, being sure that clients behind FascistFirewalls get a chance while also taking into account teor's concern not to make this list too big.
Hmm, this seems like something worth considering.
IIUC you are basically suggesting using a single guardlist instead of having two. And then also making sure that a representative amount of dystopic bridges are present in that guardlist. So if the set of guards in the latest consensus is 20% dystopic bridges (in bandwidth), we should choose 20% of our guardlist to be dystopic guards.
I can see how this simplifies the guard selection logic, with the cost of slightly complicating the guard sampling logic. It seems like a good trade in this case.
However, this whole suggestion assumes that a good percentage of the total guard bandwidth is actually dystopic (your mail assumes 28% which is a pretty good number). What's the actual bandwidth % of dystopic bridges in the current network? If it's 20% it's a good number for our purposes, but if it's like 1% then that's bad since we are only going to end up adding 1 or 2 dystopic bridges only making our heuristic much more flaky. Also, in the end the quality of our heuristic will depend on this % which is suboptimal.
FWIW in my definition of dystopic bridge above I've been assuming it's bridges with ORPort on 80 or 443.
We should look more into this. Reinaldo, Ola, any comments? Is this simulatable?
Hi,
On 3/27/2016 1:00 PM, George Kadianakis wrote:
[snip]
https://lists.torproject.org/pipermail/tor-dev/2015-November/009871.html
Hmm, this seems like something worth considering.
IIUC you are basically suggesting using a single guardlist instead of having two. And then also making sure that a representative amount of dystopic bridges are present in that guardlist. So if the set of guards in the latest consensus is 20% dystopic bridges (in bandwidth), we should choose 20% of our guardlist to be dystopic guards.
I can see how this simplifies the guard selection logic, with the cost of slightly complicating the guard sampling logic. It seems like a good trade in this case.
However, this whole suggestion assumes that a good percentage of the total guard bandwidth is actually dystopic (your mail assumes 28% which is a pretty good number). What's the actual bandwidth % of dystopic bridges in the current network? If it's 20% it's a good number for our purposes, but if it's like 1% then that's bad since we are only going to end up adding 1 or 2 dystopic bridges only making our heuristic much more flaky. Also, in the end the quality of our heuristic will depend on this % which is suboptimal.
FWIW in my definition of dystopic bridge above I've been assuming it's bridges with ORPort on 80 or 443.
We should look more into this. Reinaldo, Ola, any comments? Is this simulatable?
Ok, sounds fair. But if the total percentage of dystopic guards (ORPort on 80 or 443) is 1% if the total Guard bandwidth it's true that a general combined list will have maybe 1 or 2 dystopic guards, making it kind of suboptimal.
But I think it's equally suboptimal if we have 2 separate lists (utopic and dystopic) and the dystopic guard list will contain only few dystopic guards or same dystopic guards for all clients (if there's only 1% of dystopic guard bandwidth). I guess this is an 'tor-environment' problem outside the scope of prop#259, and in case we will have only 1% of dystopic guards we will experience overloads and usability problems for users behind FascistFirewalls regardless if we use a combined list or two separate lists. But on the other hand if we use one combined list we could simplify guard selection logic and maybe win some seconds for first successful connection.
On 27 Mar 2016, at 05:42, s7r s7r@sky-ip.org wrote:
Hello,
teor, asn, see comments inline.
On 3/24/2016 5:00 PM, Tim Wilson-Brown - teor wrote: [snip]
The number of directory guards will increase when 0.2.8-stable is released and relays and clients upgrade. In 0.2.8, relays accept tunnelled directory connections even if they do not have an open DirPort.
Indeed, soon enough all guards will be directory guards.
Almost all guards will be directory guards. AccountingMax can disable tunnelled directory fetches, as can DirCache 0.
I guess the guards that won't be accepting tunneled BEGIN_DIR connections because of AccountingMax or DirCache 0 will also advertise this in their descriptors, so these relays will not get a `V2Dir` flag. Can you confirm if this is actually true? I assume the code has to do this, otherwise how can a client know if he can initiate a tunneled BEGIN_DIR connection with a relay or not.
Yes, it adds a line to its descriptor saying it supports tunnelled connections.
[snip] Client Bootstrap
The proposal ignores client bootstrap.
There are a limited number of hard-coded authorities and fallback directories available during client bootstrap. The client doesn't select guards until it has bootstrapped from one of the 9 authorities or 20-200 fallback directories.
I think this step is before prop#259 does its magic, since prop#259 first needs a consensus before it can work. Let's call this initial (genesis) bootstrap Step 0 - only after a client has bootstrapped (either from an authority or from a fallback directory) he will initiate prop#259 to pick a guard.
So do we throw away the information about reachable ports we gained during bootstrap? It's simpler, but slower. Perhaps too slow for a good user experience.
Bootstrap / Launch Time
The proposal calculates bootstrap and launch time incorrectly.
The proposal assumes that Tor attempts to connect to each guard, waits for failure before trying another. But this isn't how Tor actually works
- it sometimes tries multiple connections simultaneously. So summing the
times for individual connection attempts to each guard doesn't provide an accurate picture of the actual connection time.
When bootstrapping in 0.2.7 and earlier, tor will try an authority, wait up to 10 seconds for it to fail, then try another. Then there's a 60 second wait before the third authority, but at that point the user has likely lost interest.
In 0.2.8, tor connects to authorities and fallbacks concurrently. It will try 3 fallbacks and 1 authority in the first 10 seconds, and download from whichever one connects first So 0.2.8 is far more likely to connect within a few seconds.
In all current versions, tor then downloads the consensus (~1.5MB, could take 10 seconds or more), and chooses directory guards. Then it simultaneously connects to 3 directory guards to download certificates and descriptors. The time it takes tor to work out if a connection to a directory guard has succeeded happens simultaneously with other directory guard timeouts.
Hmm. This requires some thinking. So Tor connects to the directory guards immediately after it gets a consensus, to get the certificates and descriptors. Plausible. I assume it does this via HTTP fetch on the DirPort, since it has _no_ certificates and descriptors for routers. Doesn't Tor need these certificates and descriptors to initiate tunneled BEGIN_DIR requests with certain relays?
It has identity key fingerprints hard-coded. I'd have to look into the tor code to see if client bootstrap connections are HTTP or HTTPS - or run a tcpdump session.
How will this work once DirPort is deprecated entirely? Or removing DirPort from relays is not part of the plan?
Deprecating the DirPort is not part of any plan I'm aware of. Relays still use DirPorts, as do some obscure client configurations.
Other Considerations
We're considering increasing the 10 second stream attach timeout to support users on slow and unreliable network connections (#16844). We should think about the impact of that on this proposal - I'd hate to double the time it takes tor to exhaust its UTOPIC guardlist.
This is correct.
Also, FascistFirewall torrc option: prop#259 sounds like it will take care of users behind FascistFirewalls by default, should we eliminate it entirely for simplicity? Or should we make it that FascistFirewall 1 will tell prop#259 to populate SAMPLED_GUARDS list only with dystopic guards OR use only a SAMPLED_DYSTOPIC_GUARDS list if we choose to keep the two lists disjoint?
It depends how quickly we can auto-discover whether we're firewalled or not. If it's going to take more than 30 seconds (typical user attention span), I'd use FascistFirewall as a hint to populate the UTOPIC lists. This will happen automatically if we only populate the UTOPIC list with reachable addresses. (Tor handles FascistFirewall, ReachableAddresses, and ClientUseIPv4/6 using the same set of functions.)
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
Tim Wilson-Brown - teor teor2345@gmail.com writes:
[ text/plain ]
On 27 Mar 2016, at 05:42, s7r s7r@sky-ip.org wrote:
Hello,
teor, asn, see comments inline.
On 3/24/2016 5:00 PM, Tim Wilson-Brown - teor wrote: [snip] The proposal ignores client bootstrap.
There are a limited number of hard-coded authorities and fallback directories available during client bootstrap. The client doesn't select guards until it has bootstrapped from one of the 9 authorities or 20-200 fallback directories.
I think this step is before prop#259 does its magic, since prop#259 first needs a consensus before it can work. Let's call this initial (genesis) bootstrap Step 0 - only after a client has bootstrapped (either from an authority or from a fallback directory) he will initiate prop#259 to pick a guard.
So do we throw away the information about reachable ports we gained during bootstrap? It's simpler, but slower. Perhaps too slow for a good user experience.
Hmm, seems like there are bits of reachability information all around the bootstrapping process :) I wonder how we could actually use them...
Here are two brainstormy/sucky ideas for heuristics:
a) If during bootstrapping we only connected to 80/443 bridges, then enable the dystopic mode (e.g. only use 80/443 bridges or sth).
A more conservative approach here, is to add points to the heuristic for each 80/443 bridge we connect to, and after enough points have been accumulated we assume that the network is dystopic and enable the dystopic mode.
b) When choosing guards we strictly prefer ports we managed to connect to during bootstrapping.
Both of those approaches are not bulletproof and can lead to false positives (i.e. we go into dystopic mode, even when the network is fine) which kind of suck because of security concerns (increased chances of using a dystopic guard) and for load balancing concerns.
They can also take obviously wrong decisions when a person moves around with their laptop a lot. Heuristic decisions from one network will carry over to the next network if Tor has not restarted.
I think more research is needed to find out how to use this information well.
Other Considerations
We're considering increasing the 10 second stream attach timeout to support users on slow and unreliable network connections (#16844). We should think about the impact of that on this proposal - I'd hate to double the time it takes tor to exhaust its UTOPIC guardlist.
This is correct.
Also, FascistFirewall torrc option: prop#259 sounds like it will take care of users behind FascistFirewalls by default, should we eliminate it entirely for simplicity? Or should we make it that FascistFirewall 1 will tell prop#259 to populate SAMPLED_GUARDS list only with dystopic guards OR use only a SAMPLED_DYSTOPIC_GUARDS list if we choose to keep the two lists disjoint?
It depends how quickly we can auto-discover whether we're firewalled or not. If it's going to take more than 30 seconds (typical user attention span), I'd use FascistFirewall as a hint to populate the UTOPIC lists. This will happen automatically if we only populate the UTOPIC list with reachable addresses. (Tor handles FascistFirewall, ReachableAddresses, and ClientUseIPv4/6 using the same set of functions.)
I wonder what would happen there if FascistFirewall gets toggled on and off.
If our guardlist was sampled when FascistFirewall was on, shouldn't we sample from the beginning if FascistFirewall goes off? That's terrible though since we lose all that guard state...
More thinking is required here as well.
On 04/04/16 11:47, George Kadianakis wrote:
I wonder what would happen there if FascistFirewall gets toggled on and off.
If our guardlist was sampled when FascistFirewall was on, shouldn't we sample from the beginning if FascistFirewall goes off? That's terrible though since we lose all that guard state...
Throwing this out there as food for brainstorming rather than a fully formed idea: what would happen if we sampled from a single list of all guards, then filtered the sampled list according to current conditions?
Filtering conditions would include: * Does the guard have the required flags in the latest consensus? * Does it match the ReachableAddresses setting, if any? * Does it match the Use/PreferIPv6 settings, if any? * Does it match the FascistFirewall setting, if any? * Does it match our current firewall guesswork? * Anything else that makes a guard a priori unsuitable
Apply all these filters to the sampled list to get a list of candidates. If the conditions change, update the filters without modifying the underlying list. If the filtered list is too short, sample more guards into the underlying list.
If I understand right, this is how the "good/bad" flag for membership in the latest consensus already works - the idea is just to use the same method for all the combined conditions.
There wouldn't be separate lists of utopic and dystopic guards - rather the list of all guards would be filtered down to dystopic guards whenever settings and/or current guesswork indicated it was appropriate.
Presumably the guesswork should be reset if there's a clue that the network has changed, such as a change in the local IP address. So, going back to the scenario you mentioned above, a less restrictive set of filters would be applied to the underlying list, resulting in more candidates without repeating any sampling.
Cheers, Michael
Hi,
Funny, this is exactly the direction my thinking has been going. There are too many different variations on restrictions - it might make sense to just have a large enough sampled set, and then filter it with the further restrictions. If nothing is found, we can fail closed at least.
Cheers