Re: [tor-dev] Proposal 207: Directory guards

13 Oct 2012

      Thus spake Nick Mathewson (nickm@alum.mit.edu):
...
On Fri, Oct 12, 2012 at 3:17 PM, Mike Perry <mikeperry@torproject.org> wrote:
...
Thus spake Nick Mathewson (nickm@torproject.org):
...
Discussion:
The rule that the set of guards and the set of directory guards need to
   be disjoint, and the rule that multiple directory guards need to be
   providing descriptors, are both attempts to make it harder for a
   single node to capture a route.
Can you explain the route capture opportunities available to directory
guards? Is it #5343/#5956?
Like that general class, yes.  It worries me to have too few sources
of directory info; with bridges we have no choice, but with directory
guards, we can make sure that we have multiple sources.
In particular, it's a little obnoxious for the same party to be both
the first hop of your circuit, *and* to know exactly what you know
about possible candidates for hop 2 and hop 3.
Ok, so it sounds like this is more the second rule than the first rule?
...
...
And how does the attack work? Can directory mirrors simply say "Sorry
man, that descriptor doesn't exist", even though the client sees it
listed in the consensus?
No, but they can say "Sorry, I don't have that descriptor."  (Same
thing actually, but not totally suspicious.  But maybe let's analyze
it and figure out how much it really happens in practice for an honest
guard.)
...
Shouldn't clients just try another directory
source in this case?
Maaybe. If all their directory guards but *one* are down, my claim is
that they should not rely on just that guard.  There are alternative
designs where you don't add directory guards unless all your guards
are down, and I don't think those are right.
Ok, this makes sense. Also second rule?
...
...
The reason I'm asking is because if we use the same Guard nodes for both
directory and normal traffic, this adds additional traffic patterns to
the set of things that Website Traffic Fingerprinting attacks must
classify, which further reduces the accuracy of that attack.
Hm.  An interesting thought.
My first inclination here is to ask, "Can we analyze this to figure
out the benefit/risk of each approach and somehow make a
mathy/quantitative argument about which is better?"  I don't know that
we'll come up with a final answer, but I think we could do well to try
to figure out how large/small benefits are likely to be.
My favorite work in the Tor Website Traffic Fingerprinting space[1]
actually measures this effect quite well. Have a look at Figure 4 in
section 5.2.2 in the "Open World" dataset (page 8). As we add more
background noise to the "Open World" of things that are fetched through
Tor Guard nodes, the true positive accuracy of the attack drops off.

In general, with more objects to classify and few features to extract,
either true positive accuracy goes down, or false positive rate goes up.
Especially when the objects are relatively low-resolution in terms of
additional reliable features to extract.

Further, because of the base rate fallacy[2], the adversary needs to
make heavy, heavy tradeoffs to ensure their false positive rate stays
way, way down. This means any objects we add to the "world" of Tor Guard
traffic pretty much are guaranteed to decrease true positive accuracy of
the attack in terms of webpages they can reliably recognize.

(Incidentally, I believe the authors of [1] understood the danger of
false positives, and that's why their graphs look the way they do. It's
not clear other traffic fingerprinting authors understand this concept.
In fact, for many of them, it's quite clear they do not.)

So, any games we can play to make directory activity look like client
web activity (especially different types and sizes of web activity) are
bonus win against the attack that cost us no traffic overhead.

[1]. http://lorre.uni.lu/~andriy/papers/acmccs-wpes11-fingerprinting.pdf
[2]. http://ksubrick.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.8982

-- 
Mike Perry