This is a good first iteration! There are some more topics that we should take into account too:
1) If the relay loses the Guard or Running flag maybe it's ok and shouldn't count towards our limits? The goal after all here is to recognize when *targeted* changes happen (targeted to a given user). Not to recognize when guard churn is high in general.
(Imagine we later start experimenting with other ways to give out the Guard flag, and that results in giving the Guard flag to some relays and then taking it away -- we wouldn't want our protections here to kick in and lock users out.)
If we go with this change, it will introduce some tricky edge cases: do we remove the relay from our threshold calculations when it loses the flags, but then add it back in when it regains them? Does that enable new attacks, or change the math on the current ones?
2) It's been mentioned before, but it's worth emphasizing the "firewalled ports" case. Maybe we should be explicit and say that a certain fraction of the guards we try need to be listening on 443, and a certain fraction need to not be?
The tradeoff is that we'd be modifying the guard selection algorithm and thus the security of guards.
I guess another point here is that we need to make a decision, like the 'tricky edge cases' paragraph above, about how to count "guards that we've tried recently but that are disallowed by our current value of ReachableAddresses" in our thresholds.
3) For the parameters we've picked, we want the numbers high enough that they don't trigger by accident, but low enough that an attack is not too damaging before it gets noticed. A) That means we're going to want to make them consensus params, since their value should be a function of the current network, and we wouldn't want older clients partitioned from newer ones. And B) I wonder if there is a sweet spot here or not.
Say an adversary runs a guard that is 1% of the consensus weight. The naive and wrong math says that there are an expected 100 guard picks before we pick this one. The right math considers not only 'without replacement', but also the fact that some guards are quite large, and the large ones are more likely to be picked, and if you can knock them out of the user's list, it brings you a lot closer to winning at the attack. If we're looking at thresholds of 8 and 20 guard picks, what is the chance, given the current set of guards and weights, that the adversary gets his guard chosen somewhere in that mix? I think it's uncomfortably high, right?
If indeed Aaron does the numbers and they look bad, I think that argues for much lower thresholds here, plus a lot more work on fixing the edge cases (better termed 'vulnerabilities' when they're used in an attack) that cause users to switch away from a guard when actually they shouldn't.
If we could tie guard choice to location, that would help a great deal, but we'd need to answer the question, "Where am I on the network", which is not so easy to do passively if you're behind a NAT.
Agreed -- this sounds both really useful and also really hard to do automagically.
I wonder if we could approximate it, in the meantime, with a torrc option which basically says "I move around a lot and my network is flaky, so I'm going to need larger thresholds". For ordinary users, we could even imagine a "what kind of user are you" question in the Tor Browser dialogs.
--Roger