[tor-talk] Tor and HTTPS graphic

Paul Syverson syverson at itd.nrl.navy.mil
Fri Mar 9 20:52:38 UTC 2012

On Thu, Mar 08, 2012 at 06:41:25AM +0000, The23rd Raccoon wrote:
> On Thu, Mar 8, 2012 at 1:39 AM, Mansour Moufid <mansourmoufid at gmail.com> wrote:
> > On Tue, Mar 6, 2012 at 11:55 PM, The23rd Raccoon
> > <the.raccoon23 at gmail.com> wrote:
> >> Now bear in mind that I'm just a Raccoon, but some time ago I scrawled
> >> a proof out that showed that the correlation accuracy of a "dragnet
> >> GPA" goes down in proportion to the square of the number of concurrent
> >> users using an anonymization service:
> >> http://archives.seul.org/or/dev/Sep-2008/msg00016.html
> >
> > Are we so sure there are no methods of correlation with zero false
> > positive rate [P(C|~M) = 0]?
> For passive correlation attacks, I have not seen any in
> dumpster-accessible research literature.
> For active attacks, there are varying classes that can achieve 0
> error. In general, 0-error success depends upon how much information
> you are able to encode into the stream, how quickly you are able to do
> it, and how reliably you are able to extract it.
> In fact, I think the research community's insistence that passive
> correlation can always succeed 

You misunderstand or at least misrepresent what is being argued
here. There does not even have to be anything incompatible in what you
are saying and this "insistence" as you put it.  The difference lies
entirely in the threat model. So we need to get more precise about
that (below).

> has blinded the tor devs to a very serious type of active attack
> that actually will: the crypo-tagging attack.

Nobody's blinded to the possibility. Many of us knew long ago that
several things like this are easy to do. It's even easier to just do
bitsquashing, as we noted in the first onion routing paper in 1996
(there are tradeoffs and may be times when other tagging attacks are
preferable, that's not the point). As a more directly connected
indicator of prior awareness, Mixminion was designed by some of the
main research people who also worked on Tor, specifically Roger and
Nick together with George Danezis. They spent a significant part of
the research paper that sets out the design talking about tagging
attacks and their countermeasures to them. 

We're all well aware of many tagging variants here. What we're saying
about them is that (1) identifying another specific example of tagging
attack without other significant contribution is not a publishable
research contribution and (2) designing in countermeasures against
such attacks (such as the Mixminion paper and some of the subsequent
formatting work in that vein did) are not worth it because it's so
easy to attack Tor whether it's made resistant to this kind of
tagging or not. (I know you don't agree with that---yet. I'm coming to
> The crypto-tagging attack performs an operation on a cell at the entry
> to the network that will cause an error upon exit of the network,
> *unless* a party at the exit of the network is able to undo it. It
> ensures a node will only carry compromised traffic.
> In 2009, the devs dismissed a version of the crypto-tagging attack
> presented by Xinwen Fu as being equivalent to correlation back when
> the "One Cell is Enough to Break Tor's Anonymity" attack came out[1].

Nobody said they were equivalent. What is actually said in [1] is

   "One of the unknowns in the research world is exactly how quickly
   the timing attack succeeds. How many seconds of traffic (and/or
   packets) do you need to achieve a certain level of confidence? I'll
   grant that if you run the entry and exit, tagging is a very simple
   attack to carry out both conceptually and in practice. But I think
   Fu underestimates how simple the timing attack can be also. That's
   probably the fundamental disagreement here."

And in that passage, they're only talking about the passive timing
attack. As noted earlier in the post and in many other places,
it's trivial to put in active timing signatures if they are needed.

> They dismissed Fu's comments about false positives by quoting
> researchers claiming that a false positive rate of 0.0006 "is just a
> nonissue". But if you do the math in my Example 1, a 0.0006 false
> positive rate is more than enough to prevent dragnet analysis of a
> heavily used network.

Actually, the post notes that this was the maximum false positive rate
achieved in the cited simulation. In the analysis on the live Tor
network also cited, there were zero false positives in thousand of
runs of the experiment (not thousands of circuits, there were also
thousands of circuits in each run of the experiment). Nonetheless, you
are right to ask about scale and base rate, but I don't think
they undermine the effective adequacy of timing attacks
in ways that ultimately matter.

> In [1], the devs offered to work towards fixing the issue if someone
> could show that it was indeed worse than passive correlation.  I
> believe I have done so. Is there anything that can be done? I'm not
> sure at the moment. Probably a conversation for another thread.

Again, you overstate. What it actually says is

     "If somebody can show us that tagging attacks are actually much
     more effective than their passive timing counterparts, we should
     work harder to fix them. If somebody can come up with a cheap way
     to make them harder, we're all ears. But while they remain on par
     with passive attacks and remain expensive to fix, then it doesn't
     seem like a good move to slow down Tor even more without actually
     making it safer."

More importantly, here's where we come to the crux of the
biscuit. What do we mean by "actually much more effective"?  You seem
primarily focused on a global passive hoovering adversary, perhaps at
a limited sample rate a la Murdoch and Zielinski. You seem to want to
show that timing correlation attack by such an adversary is not so
bad, but tagging would be effective. I think you are a little quick in
the practical conclusions you draw from your analysis (ignoring
intersections and ancillary information) and how you come up with the
numbers on which you base your analysis (what you do with them seems
fine), but I won't debate those points because I don't care even if
you turned out to be right in those aspects: I'm not very worried
about that adversary because I don't think it's realistic threat to be
that global or that passive (large is fine, even multijurisdictional,
and only making small delay patterns in passing traffic, hmmm OK
maybe. But GPA I just don't buy). In any case, even if we were worried
about the global hoover, you don't want to limit to a passive attacker
as your focus on tagging illustrates.  But once an adversary can be
active there are all kinds of active timing techniques that padding
can't address, ranging from the provably secure, provably undetectable
to the merely highly effective and practical. So the usability costs
and network overhead that countering tagging would imply would not
even help much.  So, to convince me that your analysis shows we should
revisit tagging for Tor you would have to show three things: 
(1) Convince me that a truly global adversary is realistically worth
worrying about, (2) convince me that an adversary that does active
timing correlation would not remain a significant threat even if
tagging were no longer possible, and (3) convince me that your numbers
correspond to reality and that the results are robust to intersection
attacks and ancillary information. (No need to bother with (3) until
(1) and (2) are established.)

I also want to comment on your consideration of an adversary looking
for the clients visiting a given website. Let's accept for the moment
the idea of full GPA and accept your numbers. Even if we accept your
EER that is at least an order of magnitude worse than experiments have
found (i.e., 99%) you come up with initial anonymity sets of who is
visiting a particular website (respectively which destinations a given
client is visiting) of around 50. That is essentially zero for a big
and powerful adversary. Then add in any ancillary information
geographic location of IP addresses, prior knowledge about the
clients, nature of the destination servers, etc. not to mention
intersections over time. Rather than undermine the adequacy of passive
correlation, you have supported its effectiveness.


More information about the tor-talk mailing list