[tor-talk] Tor and HTTPS graphic

Sat Mar 10 05:55:18 UTC 2012

On Fri, Mar 9, 2012 at 8:52 PM, Paul Syverson <syverson at itd.nrl.navy.mil> wrote:
> On Thu, Mar 08, 2012 at 06:41:25AM +0000, The23rd Raccoon wrote:
>> has blinded the tor devs to a very serious type of active attack
>> that actually will: the crypo-tagging attack.
>
> Nobody's blinded to the possibility. Many of us knew long ago that
> several things like this are easy to do.

I meant blinded to the severity.

>> In 2009, the devs dismissed a version of the crypto-tagging attack
>> presented by Xinwen Fu as being equivalent to correlation back when
>> the "One Cell is Enough to Break Tor's Anonymity" attack came out[1].
>
> As noted earlier in the post and in many other places,
> it's trivial to put in active timing signatures if they are needed.

Only if you have enough data to encode a time signature into. One cell
is not very much data. You'll see why this matters in a few
paragraphs.

> So, to convince me that your analysis shows we should revisit tagging for Tor you
> would have to show three things:

Your requirements don't seem to match the goal of revisiting tagging,
so I bent them slightly. Your requirements seem instead to invite
revisiting correlation entirely. But that's OK. I want to deal with
tagging, too, so I'll deal with it first. It's along the way, as they
say.

As we'll see, tagging allows a type of amplification attack that can
be *simulated* with a timing attack, but I'll argue it is simulated
poorly. I have not yet provided full Bayesian analysis of the bounds
of the accuracy of simulation, but I have written the dominating
components, and I'll finish it if you like.

If you want me to analyze active timing attacks using similar Bayesian
analysis, that might be a taller order. I'd need to scavenge the local
dumpster archives for a while to collect a representative sample of
attacks and pour over how to interpret their (very likely
misrepresented or at least embellished) results. If you could select
your favorites, it might speed things along.

Either way, just let me know.

> (1) Convince me that a truly global adversary is realistically worth worrying about

Intuitively, tagging attacks create a "half-duplex global" adversary
in places where there was no adversary before, because the
non-colluding entrances and exits of the network start working for
you. You get to automatically boost your attack resource utilization
by causing any uncorrelated activity you see to immediately fail, so
you don't even have to worry about it. This effect is by virtue of the
tag being destructive to the circuit if the cell is not untagged, and
also being destructive when a cell is "untagged" on a non-tagged
circuit.

In other words: in the EFFs graphic, tagging attacks create a second
translucent NSA dude everywhere in the world *for free*. This
translucent NSA dude is effectively closing circuits that the real NSA
dude didn't want to go to there in the first place. He makes sure that
your circuits only go through another NSA dude.

So to answer your question: because of this "half-duplex global"
property, the tagging attack actually does not require you to have to
worry about a true global adversary to see it is worse than
correlation (active or passive).

Any amount of resources (global or local) that you devote to tagging
automatically get amplified for free by the global translucent NSA
dude.

How well you are able to correlate afterword requires a secondary
attack. Depending upon the nature of the tagging vulnerability you
find, you might be able to encode an arbitrary bitstring to uniquely
identify the user, eliminating the need for any subsequent
correlation. In fact, I'm pretty sure this is possible.

> (2) convince me that an adversary that does active timing correlation would not
> remain a significant threat even if tagging were no longer possible

I'm going to bend the rules again and instead try to convince you that
an attacker who tags can observe more compromised traffic than an
active timing attacker who attempts to simulate his attack, making
tagging qualify as an amplification attack in a separate class
entirely.

To simulate the same amplification attack with correlation (active or
passive), you have to correlate every circuit at your first NSA dude
to every other circuit at your second NSA dude, and kill the circuits
that don't have a match on both sides.

You also have the added challenge of doing the initial correlation
with few enough cells to kill the circuit before any streams are
attached (so users don't notice). The need for early detection rules
out virtually all of the benefits of active timing attacks for this
step, which require quite a lot of data to encode their fingerprints
(especially when making them provably effective or practically
invisible).

Therefore, we are back to analysis dominated by passive correlation
for the circuit killing step (the crux of the simulation).

In order to kill the circuits that don't match, NSAdude1 has to ask
NSAdude2 out of band if NSAdude2 has seen a match for each circuit
that NSAdude1 sees, and viceversa. The probability P(M|C) of the NSA
dudes seeing a true match given their correlater predicted one trends
down in proportion to P(M) = (c/n)^2 * (1/M)^2, similar to my Example
3 but with an extra 1/M factor in there, since we're talking about a
fully correlating adversary.

Note that that M doesn't change (from 5000 in my examples) just
because you see less streams locally. Your probability of seeing a
match is pretty low compared to all the other things you see. (This
piece will also be key for any later analysis of active timing
attacks, which still will be dominated by 1/M^2).

Therefore, even if (or perhaps *especially if*) you don't devote
global-scale resources to the attack, you're going to be crushed by
the base rate.

To complete the simulation, at the circuit killing stage your choice
is either over-estimate and take the union of the first pass
mismatches and waste resources, or only pick the intersection of 1:1
matches on the first pass and kill off quite a few actual matches.

Therefore, you lose either resource amplification or the omnipresent
"half-duplex" translucent NSA dude that the tagger gets for free, and
depending on implementation choice you might even end up doing worse
than the active attack by itself without attempting the
circuit-closing amplification. The exact amount of tradeoff depends on
how global vs how local you are, and if you choose to be lenient or
aggressive in your uncorrelated killing.

I conclude that the superiority of true tagging over simulated tagging
clearly makes true tagging qualify as a resource amplification attack,
which is indeed considered a different class of attack than
correlation alone.

Would you like a Bayesian proof with some real numbers, or do you
concede we should move on to active timing attacks?

> (3) convince me that your numbers correspond to reality and that the results are robust to
> intersection attacks and ancillary information.

Is this a trick question? Dude, you realize I'm a Raccoon, right?...

Nothing is robust to intersection attacks. If you add up enough pieces
of info over time, you deanonymize someone. The game's all about
collecting enough bits from wherever you can (or about scattering
those bits to the wind, if you're on the other side of the line).

> I also want to comment on your consideration of an adversary looking
> for the clients visiting a given website. Let's accept for the moment
> the idea of full GPA and accept your numbers. Even if we accept your
> EER that is at least an order of magnitude worse than experiments have
> found (i.e., 99%) you come up with initial anonymity sets of who is
> visiting a particular website (respectively which destinations a given
> client is visiting) of around 50. That is essentially zero for a big
> and powerful adversary. Then add in any ancillary information
> geographic location of IP addresses, prior knowledge about the
> clients, nature of the destination servers, etc. not to mention
> intersections over time. Rather than undermine the adequacy of passive
> correlation, you have supported its effectiveness.

You (and others in this thread) misunderstand me. I'm not saying that
correlation never works, or that all three of my examples are safe
places to be if you want anonymity from the tor network as it is
currently deployed and used.

I'm merely saying that sweeping all types of end to end attacks under
the rug blinds you to the very real effect that adding more concurrent
users to the network has on correlation, and the difference is in fact
substantial enough to alter at least some aspects of the threat model
to take user base size and activity into account before evaluating
attacks.