Filename: 207-directory-guards.txt Title: Directory guards Author: Nick Mathewson Created: 10-Oct-2012 Status: Open Target: 0.2.4.x
Motivation:
When we added guard nodes to resist profiling attacks, we made it so that clients won't build general-purpose circuits through just any node. But clients don't use their guard nodes when downloading general-purpose directory information from the Tor network. This allows a directory cache, over time, to learn a large number of IPs for non-bridge-using users of the Tor network.
Proposal:
In the same way as they currently pick guard nodes as needed, adding more guards as those nodes are down, clients should also pick a small-ish set of directory guard nodes, to persist in Tor's state file.
Clients should not pick their own guards as directory guards, or pick their directory guards as regular guards.
When downloading a regular directory object (that is, not a hidden service descriptor), clients should prefer their directory guards first. Then they should try more directories from a recent consensus (if they have one) and pick one of those as a new guard if the existing guards are down and a new one is up. Failing that, they should fall back to a directory authority (or a directory source, if those get implemented-- see proposal 206).
If a client has only one directory guard running, they should add new guards and try them, and then use their directory guards to fetch multiple descriptors in parallel.
Discussion:
The rule that the set of guards and the set of directory guards need to be disjoint, and the rule that multiple directory guards need to be providing descriptors, are both attempts to make it harder for a single node to capture a route.
Open questions and notes:
What properties does a node need to be a suitable directory guard? If we require that it have the Guard flag, we'll lose some nodes: only 74% of the directory caches have it (weighted by bandwidth).
We may want to tune the algorithm used to update guards.
For future-proofing, we may want to have the DirCache flag from 185 be the one that nodes must have in order to be directory guards. For now, we could have authorities set it to Guard && DirPort!=0, with a better algorithm to follow. Authorities should never get the DirCache flag.
Thus spake Nick Mathewson (nickm@torproject.org):
Filename: 207-directory-guards.txt Title: Directory guards
Motivation:
When we added guard nodes to resist profiling attacks, we made it so that clients won't build general-purpose circuits through just any node. But clients don't use their guard nodes when downloading general-purpose directory information from the Tor network. This allows a directory cache, over time, to learn a large number of IPs for non-bridge-using users of the Tor network.
Proposal:
In the same way as they currently pick guard nodes as needed, adding more guards as those nodes are down, clients should also pick a small-ish set of directory guard nodes, to persist in Tor's state file.
Clients should not pick their own guards as directory guards, or pick their directory guards as regular guards.
When downloading a regular directory object (that is, not a hidden service descriptor), clients should prefer their directory guards first. Then they should try more directories from a recent consensus (if they have one) and pick one of those as a new guard if the existing guards are down and a new one is up. Failing that, they should fall back to a directory authority (or a directory source, if those get implemented-- see proposal 206).
If a client has only one directory guard running, they should add new guards and try them, and then use their directory guards to fetch multiple descriptors in parallel.
Discussion:
The rule that the set of guards and the set of directory guards need to be disjoint, and the rule that multiple directory guards need to be providing descriptors, are both attempts to make it harder for a single node to capture a route.
Can you explain the route capture opportunities available to directory guards? Is it #5343/#5956?
And how does the attack work? Can directory mirrors simply say "Sorry man, that descriptor doesn't exist", even though the client sees it listed in the consensus? Shouldn't clients just try another directory source in this case?
The reason I'm asking is because if we use the same Guard nodes for both directory and normal traffic, this adds additional traffic patterns to the set of things that Website Traffic Fingerprinting attacks must classify, which further reduces the accuracy of that attack.
On Fri, Oct 12, 2012 at 3:17 PM, Mike Perry mikeperry@torproject.org wrote:
Thus spake Nick Mathewson (nickm@torproject.org):
Discussion:
The rule that the set of guards and the set of directory guards need to be disjoint, and the rule that multiple directory guards need to be providing descriptors, are both attempts to make it harder for a single node to capture a route.
Can you explain the route capture opportunities available to directory guards? Is it #5343/#5956?
Like that general class, yes. It worries me to have too few sources of directory info; with bridges we have no choice, but with directory guards, we can make sure that we have multiple sources.
In particular, it's a little obnoxious for the same party to be both the first hop of your circuit, *and* to know exactly what you know about possible candidates for hop 2 and hop 3.
And how does the attack work? Can directory mirrors simply say "Sorry man, that descriptor doesn't exist", even though the client sees it listed in the consensus?
No, but they can say "Sorry, I don't have that descriptor." (Same thing actually, but not totally suspicious. But maybe let's analyze it and figure out how much it really happens in practice for an honest guard.)
Shouldn't clients just try another directory source in this case?
Maaybe. If all their directory guards but *one* are down, my claim is that they should not rely on just that guard. There are alternative designs where you don't add directory guards unless all your guards are down, and I don't think those are right.
The reason I'm asking is because if we use the same Guard nodes for both directory and normal traffic, this adds additional traffic patterns to the set of things that Website Traffic Fingerprinting attacks must classify, which further reduces the accuracy of that attack.
Hm. An interesting thought.
My first inclination here is to ask, "Can we analyze this to figure out the benefit/risk of each approach and somehow make a mathy/quantitative argument about which is better?" I don't know that we'll come up with a final answer, but I think we could do well to try to figure out how large/small benefits are likely to be.
Thus spake Nick Mathewson (nickm@alum.mit.edu):
On Fri, Oct 12, 2012 at 3:17 PM, Mike Perry mikeperry@torproject.org wrote:
Thus spake Nick Mathewson (nickm@torproject.org):
Discussion:
The rule that the set of guards and the set of directory guards need to be disjoint, and the rule that multiple directory guards need to be providing descriptors, are both attempts to make it harder for a single node to capture a route.
Can you explain the route capture opportunities available to directory guards? Is it #5343/#5956?
Like that general class, yes. It worries me to have too few sources of directory info; with bridges we have no choice, but with directory guards, we can make sure that we have multiple sources.
In particular, it's a little obnoxious for the same party to be both the first hop of your circuit, *and* to know exactly what you know about possible candidates for hop 2 and hop 3.
Ok, so it sounds like this is more the second rule than the first rule?
And how does the attack work? Can directory mirrors simply say "Sorry man, that descriptor doesn't exist", even though the client sees it listed in the consensus?
No, but they can say "Sorry, I don't have that descriptor." (Same thing actually, but not totally suspicious. But maybe let's analyze it and figure out how much it really happens in practice for an honest guard.)
Shouldn't clients just try another directory source in this case?
Maaybe. If all their directory guards but *one* are down, my claim is that they should not rely on just that guard. There are alternative designs where you don't add directory guards unless all your guards are down, and I don't think those are right.
Ok, this makes sense. Also second rule?
The reason I'm asking is because if we use the same Guard nodes for both directory and normal traffic, this adds additional traffic patterns to the set of things that Website Traffic Fingerprinting attacks must classify, which further reduces the accuracy of that attack.
Hm. An interesting thought.
My first inclination here is to ask, "Can we analyze this to figure out the benefit/risk of each approach and somehow make a mathy/quantitative argument about which is better?" I don't know that we'll come up with a final answer, but I think we could do well to try to figure out how large/small benefits are likely to be.
My favorite work in the Tor Website Traffic Fingerprinting space[1] actually measures this effect quite well. Have a look at Figure 4 in section 5.2.2 in the "Open World" dataset (page 8). As we add more background noise to the "Open World" of things that are fetched through Tor Guard nodes, the true positive accuracy of the attack drops off.
In general, with more objects to classify and few features to extract, either true positive accuracy goes down, or false positive rate goes up. Especially when the objects are relatively low-resolution in terms of additional reliable features to extract.
Further, because of the base rate fallacy[2], the adversary needs to make heavy, heavy tradeoffs to ensure their false positive rate stays way, way down. This means any objects we add to the "world" of Tor Guard traffic pretty much are guaranteed to decrease true positive accuracy of the attack in terms of webpages they can reliably recognize.
(Incidentally, I believe the authors of [1] understood the danger of false positives, and that's why their graphs look the way they do. It's not clear other traffic fingerprinting authors understand this concept. In fact, for many of them, it's quite clear they do not.)
So, any games we can play to make directory activity look like client web activity (especially different types and sizes of web activity) are bonus win against the attack that cost us no traffic overhead.
[1]. http://lorre.uni.lu/~andriy/papers/acmccs-wpes11-fingerprinting.pdf [2]. http://ksubrick.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.8982
On Fri, Oct 12, 2012 at 10:53 PM, Mike Perry mikeperry@torproject.org wrote:
Thus spake Nick Mathewson (nickm@alum.mit.edu):
On Fri, Oct 12, 2012 at 3:17 PM, Mike Perry mikeperry@torproject.org wrote:
Thus spake Nick Mathewson (nickm@torproject.org):
Discussion:
The rule that the set of guards and the set of directory guards need to be disjoint, and the rule that multiple directory guards need to be providing descriptors, are both attempts to make it harder for a single node to capture a route.
Can you explain the route capture opportunities available to directory guards? Is it #5343/#5956?
Like that general class, yes. It worries me to have too few sources of directory info; with bridges we have no choice, but with directory guards, we can make sure that we have multiple sources.
In particular, it's a little obnoxious for the same party to be both the first hop of your circuit, *and* to know exactly what you know about possible candidates for hop 2 and hop 3.
Ok, so it sounds like this is more the second rule than the first rule?
I think it's both, perhaps. If only one source is providing you with directory info, you're in trouble either way. But if that source is also your first hop, it is farther along in its attempts to manipulate you than it would be otherwise, and has an easier time taking advantage of them. It can also take advantage of knowledge a little better.
In particular, if I'm your guard, and you ask me for descriptors some nodes including node X, and you then immediately build a circuit through me before I tell you node X, I know you didn't know know node X when you built that circuit. Contrast that with the case where I'm only a guard -- I don't know what you're downloading. And contrast that with the case where I'm only a directory -- I don't know when, exactly, you're building circuits.
Even if you *do* have multiple working guards, the issue still exists. Once I see that you're building circuits for traffic, I know that any descriptor I give you *after* that point wasn't used for those circuits. This lets me narrow down the set of circuits you might have built.
(Incidentally, a directory guard can probably tell how many other functional directory guards you have based on what fraction of the descriptors it serves you. It can probably even tell when one of your other dirguards is down, based on when it gets asked for more descriptors on a timeframe that implies that this is a retry. Not sure the best way to build an attack out of that.)
[...]
So, any games we can play to make directory activity look like client web activity (especially different types and sizes of web activity) are bonus win against the attack that cost us no traffic overhead.
Hm. I think you make an okay argument that doing directory fetches over the same connections as web traffic *might* make fingerprinting harder, especially if the directory fetches happen roughly concurrently with the web traffic.[1] I don't think we can upgrade this "might" into a "will" without actual experimentation here.
But the analysis I was hoping we could think about was the good old one about tradeoffs between the two designs here (design A: disjoint guards and dirguards; design B: dirguards are guards). In your message, you make a case that there could be benefit to B. I think you're right, but that's only half the analysis we need. We need to know whether the benefit from B is likely to be greater than the benefit from A. To do that, we also need a way to examine both and compare them.
yrs,
Thus spake Nick Mathewson (nickm@alum.mit.edu):
On Fri, Oct 12, 2012 at 10:53 PM, Mike Perry mikeperry@torproject.org wrote:
Thus spake Nick Mathewson (nickm@alum.mit.edu):
On Fri, Oct 12, 2012 at 3:17 PM, Mike Perry mikeperry@torproject.org wrote:
Thus spake Nick Mathewson (nickm@torproject.org):
Discussion:
The rule that the set of guards and the set of directory guards need to be disjoint, and the rule that multiple directory guards need to be providing descriptors, are both attempts to make it harder for a single node to capture a route.
Can you explain the route capture opportunities available to directory guards? Is it #5343/#5956?
Like that general class, yes. It worries me to have too few sources of directory info; with bridges we have no choice, but with directory guards, we can make sure that we have multiple sources.
In particular, it's a little obnoxious for the same party to be both the first hop of your circuit, *and* to know exactly what you know about possible candidates for hop 2 and hop 3.
Ok, so it sounds like this is more the second rule than the first rule?
I think it's both, perhaps. If only one source is providing you with directory info, you're in trouble either way. But if that source is also your first hop, it is farther along in its attempts to manipulate you than it would be otherwise, and has an easier time taking advantage of them. It can also take advantage of knowledge a little better.
In particular, if I'm your guard, and you ask me for descriptors some nodes including node X, and you then immediately build a circuit through me before I tell you node X, I know you didn't know know node X when you built that circuit. Contrast that with the case where I'm only a guard -- I don't know what you're downloading. And contrast that with the case where I'm only a directory -- I don't know when, exactly, you're building circuits.
Even if you *do* have multiple working guards, the issue still exists. Once I see that you're building circuits for traffic, I know that any descriptor I give you *after* that point wasn't used for those circuits. This lets me narrow down the set of circuits you might have built.
If we set limits before building circuits to large sections of the consensus for each position (for example 75% of the consensus bandwidth for that position), it seems that we can put whatever bounds on this attack we choose...
It's also an attack that can only happen for a very small window of time, in contrast to the benefit against the traffic fingerprinting attack, which is time invariant (if we do it right - see below).
So, any games we can play to make directory activity look like client web activity (especially different types and sizes of web activity) are bonus win against the attack that cost us no traffic overhead.
Hm. I think you make an okay argument that doing directory fetches over the same connections as web traffic *might* make fingerprinting harder, especially if the directory fetches happen roughly concurrently with the web traffic.[1] I don't think we can upgrade this "might" into a "will" without actual experimentation here.
Again, this experimentation is already done. It's quite clear that adding more objects to the world of Guard activity reduces traffic fingerprinting accuracy, regardless of if that activity is concurrent with client traffic or not.
The only thing that would change this is if the adversary could somehow detect your directory activity using some other information channel other than the actual traffic patterns to specific Guards. If such a side channel exists, then yes, we would likely only experience the benefit during concurrent activity (due to feature resolution degradation).
Unfortunately, it would seem that to a local observer, any directory guards that are not also Guards would provide this information channel, since all directory activity happens at roughly the same time, right?
On Mon, Oct 15, 2012 at 2:48 PM, Mike Perry mikeperry@torproject.org wrote: [...]
Again, this experimentation is already done. It's quite clear that adding more objects to the world of Guard activity reduces traffic fingerprinting accuracy, regardless of if that activity is concurrent with client traffic or not.
If that's the case, then it would amount to, what? the equivalent of every user visiting one additional website on a regular basis? Every user visiting approximately the same website (since everybody downloads the same directory info)?
My understanding is that while users *would* resist fingerprinting better if everybody picked a random website off the internet and visited it periodically, it wouldn't help much if (say) we told everybody to visit CNN once a day. Gotta reread that paper and see if it says differently.
The only thing that would change this is if the adversary could somehow detect your directory activity using some other information channel other than the actual traffic patterns to specific Guards. If such a side channel exists, then yes, we would likely only experience the benefit during concurrent activity (due to feature resolution degradation).
Huh. If they're observing you, I bet directory traffic would be relatively easy to note. It's going to happen periodically whenever consensuses become unfresh; and it's doing to involve simultaneous requests to (approximately) all your guards; and has a characteristic "make one request for the consensus, then make a lot of requests to everybody for the descriptors" pattern; and it has a characteristic patterns of retries that probably doesn't look the same as retrying a failed circuit.
Further, the observer *knows* that the client is going to be making directory requests periodically: part of their algorithm is now going to be identifying which requests are directory requests, so that they can be ignored.
Unfortunately, it would seem that to a local observer, any directory guards that are not also Guards would provide this information channel, since all directory activity happens at roughly the same time, right?
That seems to be the case too.