[tor-talk] End-to-end correlation for fun and profit

Mike Perry mikeperry at torproject.org
Sun Aug 26 01:11:46 UTC 2012


Thus spake Maxim Kammerer (mk at dee.su):

> On Sat, Aug 25, 2012 at 1:12 AM, Mike Perry <mikeperry at torproject.org> wrote:
> > The Raccoon has made a believer out of me, but there are some limits to
> > both of his/her proofs.. The full proofs can still be found here:
> > http://web.archive.org/web/20100416150300/http://archives.seul.org/or/dev/Sep-2008/msg00016.html
> 
> Wrt. the first proof, it seems to me that the assumed correlation
> accuracy rate of 99.9% is incredibly low, and I think that the Raccoon
> recognized that by referring to sampling and retention at the end of
> his post. With the targeted attack that's similar to “Example 3” in
> Raccoon's post that I described in my previous comment here, where one
> analyzes all exit traffic without missing packets, I would expect the
> correlation accuracy (and as a result, match confidence) to
> exponentially approach 100% very quickly with the number of relevant
> packets seen, and extremely quickly if the traffic is interactive
> (i.e., browsing).
> 
> Actually, c/n of 30% in “Example 3” is close to the 25% that's
> discussed in the OP here, so let's redo the example with c/n=25% and
> different correlation accuracies (leaving the other numbers intact):
> 
> (using “bc -l”)
> ca  = 0.999
> pm  = (1/5000) * (0.25)^2
> ca*pm / (pm*ca + (1-pm)*(1-ca))
> 
> ca  = 0.999
> .01233363786760166917
> ca = 0.9999
> .11110246894375430565
> ca = 0.99999
> .55555617284636495961
> ca = 0.999999
> .92592671467910125759
> ca = 0.9999999
> .99206358969515668554
> ca = 0.99999999
> .99920064946444143613
> ca = 0.999999999
> .99992000739924807495
> 
> So reducing correlation accuracy error to 10^-9 will give you 99.99%
> confidence in end-to-end correlation match. I suspect that a few
> seconds of interactive traffic will give you a correlation accuracy
> that's much better than a 10^-9 error.

Well, the argument over correlation accuracy comes down to observation
resolution, feature extraction ability, and academic lab conditions
versus reality. For an example, let's assume that the adversary cannot
see inside of Guard TLS connections. With this assumption: if at any
point there's concurrent Guard TLS activity from a single client (either
other circuit activity, directory fetch activity, or circuit
pre-building activity), then some or all of your fine-grained timing and
size information features go out the window.

To see the effects of this currently, consider: Is it *really* the case
that only one connection in *a billion* experiences incidental
concurrent activity that interferes with or obliterates high-resolution
feature extraction? I think the actual rate of random (or deliberate)
concurrent activity is much higher than that, especially for heavily
used tor clients, and even more so if they are serving as bridges or
relays. 

But, against high-resolution adversaries, the really interesting
question is: How little real cover traffic is actually needed to obscure
timing and size information to the point where the remaining features
are insufficient for high rates of correlation success? And over how
many observations can such activity be expected to survive for a given
base rate of similar activity?

I suspect that for relatively short-lived bursts of traffic like web
site views and random webapp AJAX activity, we can actually do pretty
well with very little effort and overhead. Especially against the
one-ended version of the correlation attack: the website fingerprinting
attack, but probably against both.

But for long-lived or otherwise atypical connections, you're absolutely
right. There's just a whole lot of information encoded there.. Almost
any level of observation will likely be able to correlate such flows
eventually, and it's also hard to imagine generalized padding techniques
that could blend these flows with web traffic.

Unfortunately, because academia has mostly concluded that this work is
uninteresting and that all forms of this problem are generally
"impossible", we have no solid answers to these types of questions wrt
what can be done in practice. Perhaps it is merely because defense
work is less sexy than attack work when it comes to getting
publications. I don't know for sure. I haven't yet figured out exactly
why CS academia is broken. There's a whole lot of symptoms, though...

But anyway, failing real research, there's always the botnets, the drug
war, and the aliens to guide us... Can I get three cheers for Big Data?
After all, I'm sure we can trust Them to tell us how the science shakes
out in the end, amirite? ;).


-- 
Mike Perry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-talk/attachments/20120825/fbc2540b/attachment.pgp>


More information about the tor-talk mailing list