Hello,
teor, asn, see comments inline.
On 3/24/2016 5:00 PM, Tim Wilson-Brown - teor wrote: [snip]
The number of directory guards will increase when 0.2.8-stable is released and relays and clients upgrade. In 0.2.8, relays accept tunnelled directory connections even if they do not have an open DirPort.
Indeed, soon enough all guards will be directory guards.
Almost all guards will be directory guards. AccountingMax can disable tunnelled directory fetches, as can DirCache 0.
I guess the guards that won't be accepting tunneled BEGIN_DIR connections because of AccountingMax or DirCache 0 will also advertise this in their descriptors, so these relays will not get a `V2Dir` flag. Can you confirm if this is actually true? I assume the code has to do this, otherwise how can a client know if he can initiate a tunneled BEGIN_DIR connection with a relay or not.
Simplest thing is to make the guard also be the directory guard. As Roger suggested the "Notes from the prop259 proposal reading group" thread, we might make the authorities assign the 'Guard' flag only to relays that also have the 'V2Dir' flag, among the other existing requirements. Until then, we require a DirPort set to be Guard, but this will add extra complexity (what if ORPort is in 443 and DirPort on 9030 - is this guard utopic or dystopic?).
[snip] Feedback on specific sections:
Under dystopic conditions (when a firewall is in place that blocks all ports except for potentially port 80 and 443), this algorithm will try to connect to 2% of all guards before switching modes to try dystopic guards. Currently, that means trying to connect to circa 40 guards before getting a successful connection. If we assume a connection try will take maximum 10 seconds, that means it will take up to 6 minutes to get a working connection.
This seems far too long for most users. Usability studies have demonstrated that users give up after approximately 30 seconds.
Can we design an algorithm that will automatically choose a dystopic guard and bootstrap within 30 seconds? What are the security tradeoffs if we do?
OK, let's assume that a connection failed timeout might take up to 10 seconds.
If Alice is behind a FascistFirewall and we want her to bootstrap within 30 seconds, this means that she always needs to have an 80/443 guard in her top three choices. This means, that we would heavily prioritize 80/443 guards over the rest, and an adversary who sets up 80/443 guards will attract more clients.
This isn't how Tor works - it tries multiple guards simultaneously. (See below for details.) Can we rework this calculation to take that into account?
I think the current proposal tries to balance this, by enabling this heuristic only after Alice exhausts her utopic guardlist. Also, keep in mind that the utopic guardlist might contain 80/443 guards as well. So if Alice is lucky, she got an 80/443 guard in her utopic guard list, and she will still bootstrap before the dystopic heuristic triggers.
There are various ways to make this heuristic more "intelligent", but I would like to maintain simplicity in our design (both simple to understand and to implement). For example, we could imagine that we always put some 80/443 guards as our primary guards, or in the utopic guardlist. Or, that we reduce the 2% requirement so that we go trigger the dystopic heuristic faster.
I agree that the maximum total time at client side to get a working connection is probably too much. However, I am thinking asn's arguments about _ensuring_ we keep at least n dystopic guards in our PRIMARY_GUARDS list: a) overloading 80/443 (dystopic guards); b) creating incentives for attackers to run 80/443 (dystopic guards) that will give them unfair probabilities to be picked by clients;
are very important and could be worth the effort to make this tradeoff and increase the maximum possible time to get a working connection at client side.
As I understand, the utopic guard list _can_ also contain dystopic guards, so a client behind a FascistFirewall might be lucky and don't have to wait until utopic guard list is exhausted entirely. This is better, but I still think it would be simpler if instead of 2 guard lists: - SAMPLED_UTOPIC_GUARDS - SAMPLED_DYSTOPIC_GUARDS
We create a single SAMPLED_GUARDS list, but we make the selection by taking into account the ratio of utopic and dystopic guards based on their weights from the last consensus. I have suggested a simple example for this few months ago in this post:
https://lists.torproject.org/pipermail/tor-dev/2015-November/009871.html
If we compute the guard list like this, load balancing shouldn't be affected in any way (we use the weights to build the list, not the number of relays). I saw the algorithm has been improved so much and covers so many aspects we didn't consider initially, but I still don't understand why we need two separate lists of utopic guards and dystopic guards when we can create a single list.
This will also allow us to safely decrease a little bit the total number of guards we are willing to try, being sure that clients behind FascistFirewalls get a chance while also taking into account teor's concern not to make this list too big.
[snip] Client Bootstrap
The proposal ignores client bootstrap.
There are a limited number of hard-coded authorities and fallback directories available during client bootstrap. The client doesn't select guards until it has bootstrapped from one of the 9 authorities or 20-200 fallback directories.
I think this step is before prop#259 does its magic, since prop#259 first needs a consensus before it can work. Let's call this initial (genesis) bootstrap Step 0 - only after a client has bootstrapped (either from an authority or from a fallback directory) he will initiate prop#259 to pick a guard.
Bootstrap / Launch Time
The proposal calculates bootstrap and launch time incorrectly.
The proposal assumes that Tor attempts to connect to each guard, waits for failure before trying another. But this isn't how Tor actually works
- it sometimes tries multiple connections simultaneously. So summing the
times for individual connection attempts to each guard doesn't provide an accurate picture of the actual connection time.
When bootstrapping in 0.2.7 and earlier, tor will try an authority, wait up to 10 seconds for it to fail, then try another. Then there's a 60 second wait before the third authority, but at that point the user has likely lost interest.
In 0.2.8, tor connects to authorities and fallbacks concurrently. It will try 3 fallbacks and 1 authority in the first 10 seconds, and download from whichever one connects first So 0.2.8 is far more likely to connect within a few seconds.
In all current versions, tor then downloads the consensus (~1.5MB, could take 10 seconds or more), and chooses directory guards. Then it simultaneously connects to 3 directory guards to download certificates and descriptors. The time it takes tor to work out if a connection to a directory guard has succeeded happens simultaneously with other directory guard timeouts.
Hmm. This requires some thinking. So Tor connects to the directory guards immediately after it gets a consensus, to get the certificates and descriptors. Plausible. I assume it does this via HTTP fetch on the DirPort, since it has _no_ certificates and descriptors for routers. Doesn't Tor need these certificates and descriptors to initiate tunneled BEGIN_DIR requests with certain relays?
How will this work once DirPort is deprecated entirely? Or removing DirPort from relays is not part of the plan?
Under prop#259 we can for example, initiate 3 simultaneous tunneled BEGIN_DIR connections with the top 3 guards in PRIMARY_GUARDS to fetch the certificates and descriptors. Until all relays update to recent enough Tor versions, we initiate 3 HTTP GET connections to DirPorts with the top 3 guards in PRIMARY_GUARDS. This shouldn't affect client's anonymity or expose him too much if it's just for the certificates and descriptors on one side, and on another side after all a client is at all the time exposed to guards in PRIMARY_GUARDS list, for usability, performance, overloading of some guards, etc.
Other Considerations
We're considering increasing the 10 second stream attach timeout to support users on slow and unreliable network connections (#16844). We should think about the impact of that on this proposal - I'd hate to double the time it takes tor to exhaust its UTOPIC guardlist.
This is correct.
Also, FascistFirewall torrc option: prop#259 sounds like it will take care of users behind FascistFirewalls by default, should we eliminate it entirely for simplicity? Or should we make it that FascistFirewall 1 will tell prop#259 to populate SAMPLED_GUARDS list only with dystopic guards OR use only a SAMPLED_DYSTOPIC_GUARDS list if we choose to keep the two lists disjoint?