Mike Perry mikeperry@torproject.org writes:
George Kadianakis:
Hello there,
recently we've been busy specifying various important improvements to entry guard security. For instance see proposals 250, 241 and ticket #16861.
Unfortunately, the current guard codebase is dusty and full of problems (see #12466, #12450). We believe that refactoring and cleaning up the entry guard code is essential before we proceed to more advanced security improvements.
We've been working on new algorithms and data structures for guard nodes as part of ticket #12595.
In this mail I include some pseudocode for this new algorithm with the hope that it will act as a draft for implementing these changes. You can find the pseucode here:
https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug...
Have you considered how these data structures might change for Prop 247?
I like Prop 247, and think it is a significant step in the right direction for the HS-specific problem (though I'd like to think a bit more about how it would compare to a more general virtual circuit mechanism that could also protect clients even for instances of application-specific Guard discovery).
More generally, I'm worried in that this is yet another case where upstream deliverables that were written over a year ago are going to cause us to arrive at a sub-optimal solution that we're forced to deploy only to rip out later. Hurray deliverables!
In terms of specific suggestions: I find that code conceptually makes more sense when it at least pretends to be object oriented. You might want to consider a single guardset object that holds all of the Guard configuration state, including the main guard list, vanguards, path bias info, etc. This object would then be passed in as the first argument to all guard-related functions, especially if we're talking about overhauling it anyway.
I agree about looking at this from an OOP prespective.
I think the main class that is introduced here is the guardset (or guardlist). The second class is the guard itself, which is basically a node_t in the code.
A guardlist contains many guards, and knows how to to manipulate them correctly.
The way I was thinking that this would work along with prop247, is that each layer of guards will be a separate guardset. So when you pick your first-layer guard, you pick it from the first-layer guardset and when you pick your second-layer guards, you pick them from the second-layer guardset.
Now, if we want to do guard buckets for the third layer in a way that each second-layer guard has a dedicated guardset of third-layer guards, we will need to split the third-layer guards to N guardsets, and assign each of these guardset to one of the N second-layer guards.
This way, when you make a circuit through second-layer guard k, you will only use the third-layer guardset that corresponds to k.
There are probably more subtetlies to this design, but this is how I imagine these two proposals to interface in basic terms.
All guard-related functions would then be prefixed with guardset_ as well, to make it clear that you could find them in guardset.c. (Also note I deliberately used guardset_ instead of guardlist_ due to Prop 247).
A short description of the algorithm is included on top, and then various methods and functions are prototyped underneath to make the logic more concrete.
Apart from the comments and XXXs on the code, here are some more thoughts on this work:
This new design focuses on protecting against path bias attacks, by slightly damaging our reachability.
Specifically, the old design is better at recovering in filtered networks, because it will keep on adding new nodes till one succeeds. In this new design, we will not try more than 80 relays per time. So if none of them passes the filtered network, bad luck no Tor.
While this failure mode should not happen much, it's bad news for users behind FascistFirewalls which are actually quite frequent. A quick fix here would be to always add an 80/443 guard on our list, however as it stands only 30% of the guards are 80/443 guards, so this has bad anonymity consequences.
I had mixed feelings about not trying to handle this better in the past, but given that such users can use the default Tor Browser bridges (and likely effectively have to, since bootstrapping takes tens of minutes anyway in this case) it may not be worth micro-optimizing for?
As always it's a tradeoff between security (against path bias attacks) and connectivity (or how quickly if ever you manage to connect to Tor).
At this point, it might make sense to default to security, but make it easy for filtered people to get connectivity. So, like, if we fail to bootstrap for a while, maybe it makes sense for tor-launcher to present a "connect me anyway!" option to the user (which will use bridges or turn on FascistFirewall).
- To improve our algorithm and make it more robust we need to understand further what kind of path bias attacks are relevant here. The adversary here is a network adversary (like a gateway) that can block our connections to certain guards. What nasty attacks can this adversary do?
I see this attack vector as adding two capabilities:
- The adversary can induce you to connect to only its chosen guards to
find you later on other networks through fingerprinting (as others have mentioned).
- The adversary gets to perform attacks that would otherwise be
impossible without the Guard node's identity key (which is required in order to unwrap TLS). There are lots of these attacks: XOR-based tagging of the cipherstream to force you to use specific exits, other instances of per-circuit failure, HS circuit fingerprinting, circuit-level traffic analysis, etc.
- In general, I tried to keep the number of heuristics and kludges to the minimum to keep the logic simple. Unfortunately, it seems that without a "network down" indicator (#16120) there is no way to avoid edge cases and false positives here.
If you can't find a more reliable OS-specific indicator, you might look at circuit_build_times_network_is_live() and where it is called. I think the channel code also sets indicators in similar places.