On Sat, Mar 31, 2018 at 06:52:51AM +0000, Mike Perry wrote:
The main argument for switching to two guards is that because of Tor's path restrictions, we're already using two guards, but we're using them in a suboptimal and potentially dangerous way.
Tor's path restrictions enforce the condition that the same node cannot appear twice in the same circuit, nor can nodes from the same /16 subnet or node family be used in the same circuit.
Tor's paths are also built such that the exit node is chosen first and held fixed during guard node choice, as are the IP, HSDIR, and RPs for onion services. This means that whenever one of these nodes happens to be the guard[4], or be in the same /16 or node family as the guard, Tor will build that circuit using a second "primary" guard, as per proposal 271[7].
Worse still, the choice of RP, IP, and exit can all be controlled by an adversary (to varying degrees), enabling them to force the use of a second guard at will.
I agree with you that we should do something about this bug, where Tor clients will switch to a rarely used guard in some situations. Our fix from ticket #14917 was not a good fix. More on that below in Section 3.1.
Not surprisingly, the two guard adversary gets to compromise clients roughly twice as quickly, but the timescales are still rather large even for the 10% adversary: they only have 50% chance of success after 4 rotations, which will take about 14 months with Tor's 3.5 month guard rotation.
Three thoughts here:
(A) You're right, 14 months doesn't sound bad here.
(B) This calculation was ignoring churn, right? That is, guards going away before you wanted to rotate from them. So another way to phrase that would be "once eight of your guards have gone away, you're in bad shape"? Looking at it that way, it seems like two guards is more than twice as scary as one, since *either* of them going away moves you one step closer on the path. Not the end of the world, but worth noticing. And maybe partially solvable by your "when one of your two goes away, stick to the remaining one" design; more on that below.
(C) Similarly, we should be sure to remember the network adversary here too. I don't know a simple way to reason about it well. Using more guards over time could be *less* than twice as scary, because sometimes the network paths overlap so you don't expose as much new surface area as you might have. And using more guards over time could be *more* than twice as scary, if the question is whether your traffic ever goes over that one bad place, since you have an exponentially low chance to *never* pick a guard where your traffic to/from that guard travels over the bad place. It really depends on your location, the guard locations, the Internet topology, and a bunch of other confusing factors.
Furthermore, our use of separate directory guards (and three of them) means that we're not really changing the situation much with the addition of another regular guard. Right now, directory guard use alone is enough to track all Tor users across the entire world.
Shit, you're right. The guard set fingerprint issue remains right now, because we never solved the directory guard side of it. :(
While the directory guard problem could be fixed[12] (and should be fixed), it is still the case that another mechanism should be used for the general problem of guard-vs-location management[9].
The part that freaks me out about all the designs I've seen here is the attack where the local adversary advertises a series of local wireless addresses, first to make you keep generating new guard contexts (similar to forcing quick guard rotation), or second to guess-and-check whether you've already got a guard context for some wireless address in the next city over. Maybe it can be solved by proper UI ("we'll just delegate the decision to the user"), but hoo boy. But that's a separate proposal fortunately. :)
3.1. Eliminate path restrictions entirely
I'm increasingly a fan of this option, the more I read these threads.
Let's examine the two attacker assumptions behind two of the attacks we're worried about.
Attack one: the client's local ISP collects coarse netflow logs, and these logs aren't detailed enough to allow a traffic volume detection attack on an existing long-lived TLS flow, so the connection to that first guard is safe; but a connection to that second guard will be unusual and not multiplexed and at exactly the time of the adversary-controlled circuit that triggered it, so that second guard, because it is used so rarely, is dangerous to use.
Attack two: if the client uses its guard as the first hop of its circuit and also the adversary-requested fourth hop, then the guard can do pairwise traffic correlation attacks on all of its circuits and realize that these two circuits it has are really two pieces of the same circuit.
This second attack seems weird to me. One reason is because in attack one we're brushing aside the traffic analysis as hard, whereas in attack two we're assuming it's trivial and perfect. But the simpler reason is: if your guard is going to participate in a traffic correlation attack against you, then it could just as easily team up with some other relay that the adversary picked. That is, avoiding reusing your guard on the other end of the circuit isn't going to save you if your guard is out to get you.
Part of why it's hard to compare these two attacks directly is because one is a client-side-observer adversary and the other is a relay-level adversary.
Let's look at "attack one" from a relay-level-adversary perspective: if your first guard is bad, you're screwed already. But if that second guard might be bad, you really want to do anything you can do to not reach out to it even once.
And "attack two" from the client-side-observer-level-adversary perspective: well, if the attacker is watching the *client*, there's no visible hint that it's reusing its guard later in the path -- and that's the whole point. But if the attacker is watching the *relay*, then suddenly we don't have as much diversity of traffic location as we thought we had. That is, even if your relay is nice, somebody watching the relay's network could do the pairwise correlation attacks we described earlier.
Another part of what bothers me about attack two -- the one where the adversary gives you your fourth hop -- is that the adversary has *other* hops in their side of the circuit, and you don't even know about them. What if they chose your guard for their middle hop? Or for *their* guard? There's nothing you can do about those cases, because you can't know that they're happening. My conclusion is that if we can't solve significant instances of this attack, we should be wary of paying a large price to solve only a piece of it.
If Tor decided to stop enforcing /16, node family, and also allowed the guard node to be chosen twice in the path, then under normal conditions, it should retain the use of its primary guard.
To be clear, the design I've been considering here is simply allowing reuse between the guard hop and the final hop, when it can't be avoided. I don't mean to allow the guard (or its family) to show up as all four hops in the path. Is that the same as what you meant, or did you mean something more thorough?
I think "can't be avoided" means HSDir, IP, RP -- which I note are all onion service related circuits.
I'd like to hear more about the "cleverly crafted exit policy" attack, and I wonder if we can't solve that differently. For example, if it's about making you do a request to a port that only one exit relay allows, and ha ha whoops your guard was on the same /16 as that exit relay... maybe it's time for the dir auths to not advertise super rare ports? This was one of the topics in the users-get-routed paper too.
One non-starter idea would be to move onion-service-related Tors to two guards, and leave other Tors at one guard. It's a non-starter because of course advertising which you are to your local network is no good. But that idea gave me a different perspective on this discussion: I wonder how much this design decision comes down to making all Tors use two guards in order to protect the onion-service-related Tors, which are the only ones who actually need it?
This approach is not as extreme as it seems on face. In fact, it is hard to come up with arguments against removing these restrictions. Tor's /16 restriction is of questionable utility against monitoring, and it can be argued that since only good actors use node family, it gives influence over path selection to bad actors in ways that are worse than the benefit it provides to paths through good actors[10,11].
Yep.
One remaining feature for MyFamily though is that relay operators can say "No, even though I run these eight relays, I'm not in a position to do traffic correlation attacks on users, because I told the users to not put me in that position." This angle of the feature is about protecting relays, not about protecting clients.
However, while removing path restrictions will solve the immediate problem, it will not address other instances where Tor temporarily opts use a second guard due to congestion, OOM, or failure of its primary guard, and we're still running into bugs where this can be adversarially controlled or just happen randomly[5].
I continue to think we need to fix these. I'm glad to see that George has been putting some energy into looking more at them. The bugs that we don't understand are especially worrying, since it's hard to know how bad they are. Moving to two guards might put a bit of a bandaid on the issues, but it can't be our long-term plan for fixing them.
Note that for this analysis to hold, we have to ensure that nodes that are at RESOURCELIMIT or otherwise temporarily unresponsive do not cause us to consider other primary guards beyond than the two we have chosen. This is accomplished by setting guard-n-primary-guards to 2 (in addition to setting guard-n-primary-guards-to-use to 2). With this parameter set, the proposal 271 algorithm will avoid considering more than our two guards, unless *both* are down at once.
I like this general idea of not immediately replacing guards so long as you have a working one. In fact, we used to do something similar back in the day: https://blog.torproject.org/improving-tors-anonymity-changing-guard-paramete... says (emphasis mine) """ Tor 0.2.3's entry guard behavior is "choose three guards, ***adding another one if two of those three go down*** but going back to the original ones if they come back up, and also throw out (aka rotate) a guard 4-8 weeks after you chose it." """
There are still some fiddly decisions to make here. For example, as you say we probably shouldn't replacement a guard just because we failed to connect to one of our guards once. We might decide that it's time to add a new second guard if the consensus tells us that one of them is down (so we have confirmation that it isn't down for just us, it's down for everybody). Or we might decide to wait on adding a new one even if it really is down, because maybe it'll come back soon. But how long do we wait? And if, while we're down to one, we encounter one of these situations where the requested fourth hop overlaps with our remaining guard, what do we do?
In fact, here's a hopefully useful insight that I've just realized: you're not concerned about one guard vs two guards, you're concerned about *transitioning* between guards. It's that moment when you're starting to use a new guard, if the attacker can observe that you're doing it, and especially if the attacker can make you do it, that is vulnerable. And starting with two guards can help, in that it postpones the time until you're forced to transition, and maybe also because if we do it right it can make the transition less visible.
But I wonder if we're looking at this backwards, and the primary question we should be asking is "How can we protect the transition between guards?" Then one of the potential answers to consider is "Maybe we should start out with two guards rather than just one." Framing it that way, are there more options that we should consider too? For example, removing the ability of the non-local attacker to trigger a transition? Then there would still be visibility of a transition, but the (non-local) attacker can't impact the timing of the transition. How much does that solve? Need to think more.
3.2. No Guard-flagged nodes as exit, RP, IP, or HSDIRs
Similar to 3.1, we could instead forbid the use of Guard-flagged nodes for the exit, IP, RP, and HSDIR positions.
This solution has two problems: First, like 3.1, it also does not handle the case where resource exhaustion could force the use of a second guard. Second, it requires clients to upgrade to the new behavior and stop using Guard flagged nodes before it can be deployed.
I'm not much of a fan of this approach (it seems so inelegant!), but I find the two problems that you identified to be unsatisfying for ruling it out. I wonder if we can find some stronger arguments against this approach?
Otherwise I might find myself starting to like it. :)
One stronger argument might be: "the attacker can always use Guard-flagged nodes for other hops on its half of the circuit, and you wouldn't even be able to know that it's doing it, so if the goal is to never have a circuit with your guard both at your end and also reused elsewhere in the circuit, sorry you can't achieve that goal, so stop messing stuff up while trying to achieve what can only ever be a partial solution."
- The future is confluxed
An additional benefit of using a second guard is that it enables us to eventually use conflux[6].
I think the performance benefits are the main arguments in favor of doing two guards. In fact, I still think that it's mainly a performance-vs-safety tradeoff.
I agree with George that moving to two guards now so that we can maybe do Conflux later is doing it the wrong way round. Since it's so easy to switch to two guards, that should be one of the very easy steps in moving to Conflux when we do, and taking the safety hit now in exchange for the potential performance benefit later doesn't seem best.
But there's another performance argument we shouldn't forget: if you have two guards, you're much more likely to have at least one guard that's adequately fast. Right now some of the guards are fast (relative to others), and some are slow (relative to others). If you get one of the lower-end guards, your Tor performance is sad -- for months! We tried to mitigate that issue when we switched to one guard, by raising the required bandwidth to get the Guard flag, so there would be no truly terrible guards. But still, some guards are more equal than others.
This issue came up especially in the context of the December/January CPU overload attacks, where some guards were overwhelmed by circuit creation requests, and if you had a happy guard, lucky you, but if you had a sad guard, you might as well delete your Tor Browser and try again.
Now, in an ideal world we should come up with fixes for all of those other issues, for example by taking the Guard flag away from relays that can't be great guards. But in the world we live in right now, we can relieve some of that pressure-to-be-perfect by giving people two guards.
But if we're only going on a performance vs safety basis, I don't see a huge rush to trade off safety until we have a better handle on what sort of performance benefits we'd actually get, and until we've compared to other low-hanging performance fruit.
In summary:
(1) I think we should fix the bug from #14917 where the attacker can push us off our guard just by naming our guard as the HSDir/IP/RP, and I think we should fix it by being willing to reuse our guard when it can't be avoided. That step will resolve some, but not all, of the pressure about moving to two guards. Then
(2) Hopefully the above discussion has helped us move forward on the remaining reasons for switching to two guards. To me the two biggest questions left to resolve are (a) how best to protect the vulnerable transition to a new guard, and if two guards is the best idea we've got for that, and (b) how big an issue is it really that having only one guard can sometimes give you a low-performance guard, and if two guards is the best idea we've got for that one too.
--Roger