A cleaner MorphMix / A P2P Tor

Roger Dingledine arma at mit.edu
Thu Mar 4 12:23:26 UTC 2004


Morphmix assumptions:
(Marc, correct me if these are wrong or misleading)

(1) We're willing to make multiple new TLS connections at each hop when
building the circuit -- we do that for witnesses.
(2) In order to have much security against collusion, we aim to have
a database of thousands of nodes.
(3) Alice will tolerate having the adversary link her to some of her Bobs,
as long as it doesn't happen "too often". (I'd like to do away with this
assumption; that's part of what this mail is about.)
(4) Alice has plenty of extra time (relative to her actual traffic)
to build circuits just to test the waters.
(5) Given a node, we are capable of picking a random node out of the
nodes we know that probably isn't colluding with that node. That's why
we pick the witness for extending.

My problems with Morphmix:

I worry that Morphmix's collusion detection system (aka reputation system)
isn't up to the task of detecting a smart adversary. Specifically,
an adversary attacking Alice should act normally until Alice connects
to one of his nodes; then he can show her only colluding nodes and win
that circuit. Morphmix has one last defense against that -- "but maybe
that was just a test circuit" -- but I'd like to do better.

Also, Morphmix doesn't do much to consider incentives against freeloading,
or selective service by adversarial nodes.

So here's a different design, with the beginnings of some analysis.

Each node keeps a list of other nodes it knows about. It periodically
asks nodes for other node lists, so it can keep roughly up to date. When
it wants to open a circuit, it picks a random set of nodes (if you like,
it can require that the nodes each have a different IP prefix or AS). It
asks each node to establish a TLS connection right now (if necessary)
to the next node.

Call a node a relay if it's willing to relay traffic for others, and a
non-relay if it's freeloading.

Assume there are n nodes in the system, that all nodes are relays, that
each node wants to hold three circuits open, and that circuits are length
5. So that's 15 circuits for each node. The expected number of links
used will remain O(n) even as the total number of links grows as n^2. At
small n, everybody just keeps track of everybody and it's like the current
Tor except without the directory servers (and slower circuit-building).

Now the way to attack the system by sheer numbers is to get a bunch
of nodes in different IP prefixes, and tell everybody about them. If
necessary, there are some additional defenses that could be adapted to
our needs. For example, http://dbpubs.stanford.edu:8090/pub/2003-51
describes a neat self-correction mechanism to prevent somebody from
advertising his nodes much more than average.

Now we don't have to deal with the morphmix collusion detection algorithms
(which probably don't work often enough to protect Alice enough from a
smart adversary anyway), and we can scale a lot better than Tor. Plus
we have the added benefit of deniability for circuit initiators (before,
if the circuit came from an OP, it was clear he was the originator).

Now assume not all the nodes are relays -- if only sqrt(n) of them are
(that is, there are sqrt(n) non-relays for every 1 relay), then every
non-relay is connected to 3-ish nodes, and every relay is connected to
O(sqrt(n)) nodes -- the sqrt(n) other relays, plus its portion of the
non-relays. Still quite manageable (and based on the Econymics paper,
relays *want* some non-relays, so they can aggregate more traffic). But
if there are too many non-relays, the remaining relays will have to talk
to an increasing number of non-relays each, which would get ugly.

How do the users know who's a relay? They build circuits through
everybody they hear about: some of them tend to work, and others
tend not to. Assuming *most* nodes either pretty-consistently relay or
pretty-consistently don't relay, then Alice can quickly find nodes which
will work, and she will be capable of building circuits that work.

More on the reputation system below; but first let's consider our
adversary. If he runs only non-relays, then he's going to have a heck
of a time getting Alice to use him for her first and last hop. So we're
providing an *incentive* for the adversary to run relays, else he won't
be able to attack Alice with it. Great.

Therefore the fraction of total peers the adversary owns is not relevant
for this attack: only his fraction of relays matters here. Specifically,
we've changed his chance of success from c^2/n^2 where c/n is the fraction
of Tor servers he owns, to c'^2/n'^2 where c'/n' is the fraction he owns
of the nodes that Alice thinks are relays.

(Overall fraction is still relevant in terms of how much influence he
has on Alice's list of known peers. Perhaps Alice should preferentially
believe the lists-of-peers from relays. Or perhaps precisely because
non-relays are not juicy targets, she should believe non-relays too --
so an adversary with a limited number of nodes cannot easily be both.)

Now consider a selective relay: a node that chooses whether to relay
based on some decision function. The simplest selective relay is a peer
that's not in the system all the time. So Alice's reputation system (to
keep it simple) should always make decisions based on recent behavior:
generally speaking, a node that was a relay just a moment ago will still
be a relay now.

A more complex selective relay might try to convince only Alice that it's
a relay. The first benefit to this is that it doesn't have to handle
non-Alice traffic, but the real benefit is that people making circuits
through it are more likely to be Alice (the attack here is that the
adversary only has to own the last hop Alice chooses -- he learns about
Bob and guesses that Alice was the initiator). Fortunately, if Alice does
her performance tests via other nodes also, building circuits as usual,
the node can't target Alice without performing for all users. And once
it's performing for everybody, it's in a position to attack everybody;
thus if the system has enough nodes, it's not appreciably easier for the
adversary to target specifically Alice than to target everybody. (There's
still the predecessor issue, where a given node making a request is more
likely to be the initiator...but so it goes.)

Another fine attack is for the selective relay to allow some incoming
circuits, but claim that certain other nodes are down. Certainly if
extending the circuit fails, Alice should give up on that circuit rather
than try to extend elsewhere (which would allow that node to dictate
her next node). But the more subtle issue is that the adversary can
influence the set of nodes that Alice thinks allow relaying. Assuming a
small fraction of adversary-owned nodes, we want a reputation system where
the adversary can't confuse Alice about too many honest relays, and can't
confuse Alice about too many dishonest non-relays. That's a hard problem
in the general case, but we may be able to get a workable simplified
answer. Part of the solution probably involves Alice doing some of her
queries directly-but-plausibly-for-others, because she knows who failed
in that case. Another part might be a witness mechanism a la mix-acc or
morphmix, if we can assume sometimes-relay-but-sometimes-suspiciously-not
is rare.

As usual, sadly, the reputation system is risky: the adversary can
manipulate Alice's behavior by feeding her information, and moreover
he can *recognize* Alice based on the behavior he induced. We must
limit the number of cases where he can make his target Alice different
from the other peers. However, even if we have the Morphmix adversary
(targetting Alice but not allowed to be very subtle about it), we're
still doing better, because at each step we choose from all the nodes we
know about, rather than the set offered by step i-1. (You should argue
with me about this.)

An adversary who can observe Alice can save some effort by accepting
relayed connections only while Alice is making circuits. An active
adversary can also blow away outgoing circuits that don't go to the right
node, thus forcing Alice onto circuits he owns if she actually wants to
communicate. That's a problem for Tor and Morphmix also, though.

And finally, it's not just a matter of whether the node accepts
connections. It's also a matter of providing reasonable-quality
service for those connections. Time for the requisite p2p totally
unanalyzable simple idea with complex emergent properties: we could give
preferential treatment to relays. You give priority to cells from nodes
that you believe are relays, or in the more complex variant, to relays
that have given you fast service recently (it would seem you need to
consider performance rather than just relay-ness, because if you incent
everybody to be a relay but nobody to perform well, then you're simply
making it hard to find a working relay). Just as Bittorrent's tit-for-tat
strategy provides better service for people who are providing you better
service, we can reward relays by passing their cells before others. Now,
there are a whole host of reputation-based anonymity attacks here (if
somebody's getting quick routing through node x, it means node x thinks
they're a relay, which means node x recently used them for something;
I can offer you fast service if I want you to go to me for fast service;
if I can make the other relays look crummy then I'll look even better;
etc). But on the other hand, incenting nodes to be relays is critical,
both so the network doesn't collapse under its own weight, and because
our security parameter above is based on the fraction of relays that
the adversary owns.

I've treated will-relay and will-exit as identical. That clearly needs
more thought.

So far so good. What have I left out? Am I crazy?

--Roger



More information about the tor-dev mailing list