Link padding and the intersection attack

Roger Dingledine arma at mit.edu
Thu Aug 8 07:00:38 UTC 2002


I've been out of touch quite a while now; I'm still digesting the mails
that were written while I was gone. But clearly I should do my part to
generate even more text first.

I've cc'ed David and Carl on this, because I also discussed this with
David a few days ago. (David, Carl, you should sub to or-dev before
responding, else it will bounce.)

I talked to iang, adam, and jbash about link padding during Black Hat. I
asked how to do less expensive link padding than n^2 while still getting
some security. Ian said that anything less than n^2 is equivalent to
no padding. Adam went a step further, saying that n^2 really isn't any
good either.

The basic argument is because of traffic confirmation. I've written some
discussion/analysis below.

To begin, treat the onion routing network as an opaque cloud. Alice makes
connections to it, and Bob gets connections from it.

Clearly Bob can't do link padding, because he's a webserver and doesn't
know what that means. Let's think about whether Alice can do link padding.

If there are lots of Alices compared to n (the number of routers), then
it's not practical for Alice to link-pad to the cloud. Each Alice will
want to link-pad a lot so she can get good throughput when she's actually
using the system; but it's easy for lots of Alices to overload a node. We
might strike an ok balance between throughput from each Alice and number
of Alices -- but each Alice must become less happy as we add more of
them. That's a bad trend. We can help a bit by time-sharing the links
(so Alice_1 can only send between 0 and 30 seconds after the minute,
and Alice_2 can only send between 30 and 60, etc), but that doesn't
lessen the pain significantly. (It also partitions anonymity sets quite
nicely. I mention the time-sharing thing mainly because every person
I've told this problem to proposes it as an answer.)

If the number of Alices is near n (equivalently, if many Alices run their
own nodes), we may be in better shape. Welcome to p2p onion routing. More
on that down the road.

"If there are lots of users, padding from each Alice to the cloud is
not practical."

Assume the adversary is the sum of all links and nodes that he is able
to compromise. That is, assume there are some links and nodes which the
adversary cannot compromise. (Our adversary is thus a static adversary,
using our previous terminology.)

"If this adversary can see the Alice connection and the Bob connection,
he wins. If he cannot see both, he does not win."

This is a strong statement. Am I trying to claim that if he can't see
one of them, he doesn't know they're participating, so he can't tie them
to the other one? Is there more to it?

Let's shift gears, and examine this word 'win' a little bit more. In
some cases it might be quite easy to win (the adversary is monitoring
all links all the time), but in others we might clarify my adversary
such that he doesn't always monitor all the links which are considered
'his' -- perhaps he needs to issue a subpoena to see each one, and he
has a limited (but renewable) number of subpoenas.

Even regardless of link padding from Alice to the cloud, there will be
times when Alice is simply not online. Link padding, at the edges or
inside the cloud, does not help for this.

In general, I'm talking about a time constant which describes how long
it takes to launch a long-term intersection attack. If Alice and Bob
have a certain behavior pattern (when they're up, when they're down,
when Alice makes requests to Bob) and we have some number of other users,
each with activity following some distribution (eg poisson), how long
does it take until the global passive adversary can be sure that Alice
is talking to Bob? In particular, there are several factors which could
improve the global intersection attack interval:

* If there are more users, it may take longer.
* If Alice's behavior isn't very odd (that is, if she behaves similarly
to other users), it may take longer.
* If other users are online more often, or Alice is online more often,
or Bob is online more often, it may take longer.
* If Alice sends requests to a bunch of people besides Bob, it may take
longer (or it may not improve anything at all -- wouldn't it be neat to
be able to show that.)
* If Alice refrains from talking to Bob as often, then it may take longer.
* If the other users send lots of requests, then they're more likely to
hit Bob, which is good. If they send lots of requests but they don't
hit Bob, do they help at all? (say, because they collide with Alice
more often)
* If Alice can convince somebody to send requests to Bob while she's
offline, it may take longer. Does it help more to get random users to
send the requests, or to get a small group to do it? What properties of
the group help most?

If the adversary can't see all the users, but has an idea of how many
users there are and an idea of the distribution of the unseen users,
intuitively it seems it should take him longer to become certain. So we
shift from the question "How long until the intersection of the suspect
sets is just {Alice}?" to the question "How long until we are more than
p certain that it's Alice?" (I guess "how long until we're more than 1/n
certain?" is another good question. Is that trivial though, because we
suspect Alice+Bob more as soon as Alice makes her first request to Bob?)

What if we're not wondering about a specific Bob, but instead we just
want to grab any pair of people. Are the above defenses more or less
effective then?

And finally, tying it all together, can we get Alice to do some link
padding before her request, so the-time-in-which-she-does-her-request
overlaps with other times when other people do link-padded-requests --
and then we can examine having just a bit of extra padding along with
her request vs having quite a bit of padding. Does it matter that she
does padding on both sides of the request (I would guess yes), or is one
side sufficient? How much impact can she have (that is, by how much can
she improve the intersection attack interval) with this approach?

Adding complexity, even if Alice and Alice_2 are both link-padding to
the same node (and both online during a request-to-Bob), our adversary
may for one reason or another be able to distinguish them farther into
the cloud. What additional assumptions do we need to make to give Alice
protection, or take away her protection?

If link padding is only between Alices and the cloud, the adversary
can observe the node they talk to and watch for actual traffic exiting
it. But if he only watches the links, he cannot know which Alice generated
the traffic. On the third hand, if he watches the node that Alice_3 is
padding to and sees that no traffic exits, he can remove Alice_3 from
the set of suspects during that request.

How much more protection does Alice get if she is always up? Is "Alice
runs a node herself" equivalent to "Alice is always up", or does she
get some additional protections due to having other traffic entering
at her? (Eg, can she keep an anonymity set of herself plus the people
link-padding to her -- or does she lose them quickly because sometimes
they're not connected?)

What approaches can we use to quantify the amount of bandwidth used vs
the amount of good it does, so we can make some decisions about whether
link padding is "worth it" and where it does the most good relative to
bandwidth used?

Who wants to do some math and tell me the answers to all this? :)

--Roger



More information about the tor-dev mailing list