A cleaner MorphMix / A P2P Tor

Mon Jul 26 08:50:16 UTC 2004

> Morphmix assumptions:
> (Marc, correct me if these are wrong or misleading)
>
> (1) We're willing to make multiple new TLS connections at each hop when
> building the circuit -- we do that for witnesses.
> (2) In order to have much security against collusion, we aim to have
> a database of thousands of nodes.
> (3) Alice will tolerate having the adversary link her to some of her Bobs,
> as long as it doesn't happen "too often". (I'd like to do away with this
> assumption; that's part of what this mail is about.)
> (4) Alice has plenty of extra time (relative to her actual traffic)
> to build circuits just to test the waters.
> (5) Given a node, we are capable of picking a random node out of the
> nodes we know that probably isn't colluding with that node. That's why
> we pick the witness for extending.

Correct.

> My problems with Morphmix:
> 
> I worry that Morphmix's collusion detection system (aka reputation system)
> isn't up to the task of detecting a smart adversary. Specifically,
> an adversary attacking Alice should act normally until Alice connects
> to one of his nodes; then he can show her only colluding nodes and win
> that circuit. Morphmix has one last defense against that -- "but maybe
> that was just a test circuit" -- but I'd like to do better.

True. The adversary wins for a couple of circuits until he'll be 
detected by Alice. From the adversary's point of view, it's like he has 
a few credits and can spend them over a time period (# of circuits). He 
can spend them "as he wishes", evenly distributed over time or "more 
aggressively" during a shorter time.

> Also, Morphmix doesn't do much to consider incentives against freeloading,
> or selective service by adversarial nodes.

The only incentive is that relaying gives better protection because it 
somewhat confuses a first intermediate malicious node. But that's rather 
weak.

> So here's a different design, with the beginnings of some analysis.
> 
> Each node keeps a list of other nodes it knows about. It periodically
> asks nodes for other node lists, so it can keep roughly up to date. When
> it wants to open a circuit, it picks a random set of nodes (if you like,
> it can require that the nodes each have a different IP prefix or AS). It
> asks each node to establish a TLS connection right now (if necessary)
> to the next node.
> 
> Call a node a relay if it's willing to relay traffic for others, and a
> non-relay if it's freeloading.
> 
> Assume there are n nodes in the system, that all nodes are relays, that
> each node wants to hold three circuits open, and that circuits are length
> 5. So that's 15 circuits for each node. The expected number of links
> used will remain O(n) even as the total number of links grows as n^2. At
> small n, everybody just keeps track of everybody and it's like the current
> Tor except without the directory servers (and slower circuit-building).
> 
> Now the way to attack the system by sheer numbers is to get a bunch
> of nodes in different IP prefixes, and tell everybody about them. If
> necessary, there are some additional defenses that could be adapted to
> our needs. For example, http://dbpubs.stanford.edu:8090/pub/2003-51
> describes a neat self-correction mechanism to prevent somebody from
> advertising his nodes much more than average.
> 
> Now we don't have to deal with the morphmix collusion detection algorithms
> (which probably don't work often enough to protect Alice enough from a
> smart adversary anyway), and we can scale a lot better than Tor. Plus
> we have the added benefit of deniability for circuit initiators (before,
> if the circuit came from an OP, it was clear he was the originator).

In principal, I agree, but there are two things that come to my mind 
right now: (1) path setup may often fail because one of the five nodes 
may have left. Not crucial because the initiator just tries again. (2) 
Local optimisation as in MorphMix is not possible, because the initiator 
does not know the state of the nodes along the tunnel. This is not easy 
to overcome (unless we have a lookup mechanism that keeps track of 
current node states, which is hard with a large number of nodes).

> Now assume not all the nodes are relays -- if only sqrt(n) of them are
> (that is, there are sqrt(n) non-relays for every 1 relay), then every
> non-relay is connected to 3-ish nodes, and every relay is connected to
> O(sqrt(n)) nodes -- the sqrt(n) other relays, plus its portion of the
> non-relays. Still quite manageable (and based on the Econymics paper,
> relays *want* some non-relays, so they can aggregate more traffic). But
> if there are too many non-relays, the remaining relays will have to talk
> to an increasing number of non-relays each, which would get ugly.

Yes. same problem with MorphMix here. If 95% are free riders, the system 
won't work well. Unless the 5% are really good bandwidth nodes, but 
essentially it would render the P2P system to a Tor-like system.

> How do the users know who's a relay? They build circuits through
> everybody they hear about: some of them tend to work, and others
> tend not to. Assuming *most* nodes either pretty-consistently relay or
> pretty-consistently don't relay, then Alice can quickly find nodes which
> will work, and she will be capable of building circuits that work.

Agreed. Are malicious nodes likely to advertise IP addresses of nodes 
that are non-relays (or non-nodes)? Like inserting bad mp3 files into 
file-sharing systems? Could that significantly disrupt the service? What 
if 90% of the nodes I know of are not usable to be used as realys? 
Hardly any path-setup will be successful. Of course I cannot test the 
nodes for availability before I use them for obvious reasons. Will such 
a system consist of 99% path setup traffic? MorphMix has advantages 
here, because every nodes just handles its local environment and 
non-nodes can easily be excluded.

> More on the reputation system below; but first let's consider our
> adversary. If he runs only non-relays, then he's going to have a heck
> of a time getting Alice to use him for her first and last hop. So we're
> providing an *incentive* for the adversary to run relays, else he won't
> be able to attack Alice with it. Great.

Indded. For an adversary in anonymity-systems, there are probably only 
two reasonable ways to play: offer great service to be in as many paths 
and collect as much information as possible, or drop any traffic to 
disrupt the service (if this is possible withour beeing detected).

> Therefore the fraction of total peers the adversary owns is not relevant
> for this attack: only his fraction of relays matters here. Specifically,
> we've changed his chance of success from c^2/n^2 where c/n is the fraction
> of Tor servers he owns, to c'^2/n'^2 where c'/n' is the fraction he owns
> of the nodes that Alice thinks are relays.
> 
> (Overall fraction is still relevant in terms of how much influence he
> has on Alice's list of known peers. Perhaps Alice should preferentially
> believe the lists-of-peers from relays. Or perhaps precisely because
> non-relays are not juicy targets, she should believe non-relays too --
> so an adversary with a limited number of nodes cannot easily be both.)
> 
> Now consider a selective relay: a node that chooses whether to relay
> based on some decision function. The simplest selective relay is a peer
> that's not in the system all the time. So Alice's reputation system (to
> keep it simple) should always make decisions based on recent behavior:
> generally speaking, a node that was a relay just a moment ago will still
> be a relay now.
> 
> A more complex selective relay might try to convince only Alice that it's
> a relay. The first benefit to this is that it doesn't have to handle
> non-Alice traffic, but the real benefit is that people making circuits
> through it are more likely to be Alice (the attack here is that the
> adversary only has to own the last hop Alice chooses -- he learns about
> Bob and guesses that Alice was the initiator). Fortunately, if Alice does
> her performance tests via other nodes also, building circuits as usual,
> the node can't target Alice without performing for all users. And once
> it's performing for everybody, it's in a position to attack everybody;
> thus if the system has enough nodes, it's not appreciably easier for the
> adversary to target specifically Alice than to target everybody. (There's
> still the predecessor issue, where a given node making a request is more
> likely to be the initiator...but so it goes.)

Another issue here is, as you mentioned above, that nodes ask other 
nodes about nodes, so Alice - having learned that this malicious node 
targeting her is a good one - would tell others about this cool node. 
The problem for the malicious node is that he can hardly control which 
nodes know about the malicious one. This is an attack I analysed in my 
thesis which in theory is possible, but very hard to carry out in practice.

> Another fine attack is for the selective relay to allow some incoming
> circuits, but claim that certain other nodes are down. Certainly if
> extending the circuit fails, Alice should give up on that circuit rather
> than try to extend elsewhere (which would allow that node to dictate
> her next node). But the more subtle issue is that the adversary can
> influence the set of nodes that Alice thinks allow relaying. Assuming a
> small fraction of adversary-owned nodes, we want a reputation system where
> the adversary can't confuse Alice about too many honest relays, and can't
> confuse Alice about too many dishonest non-relays. That's a hard problem
> in the general case, but we may be able to get a workable simplified
> answer. Part of the solution probably involves Alice doing some of her
> queries directly-but-plausibly-for-others, because she knows who failed
> in that case. Another part might be a witness mechanism a la mix-acc or
> morphmix, if we can assume sometimes-relay-but-sometimes-suspiciously-not
> is rare.

Maybe this is too easy but assuming a node n_1 claims the next hop n_2 
is not reachable, Alice simply tries to contact n_2 directly (e.g. 
picking n_2 as the first hop in a dummy circuit of length 1 that will be 
torn down right away). If n_2 is indeed down, n_1 "wins" and gets a 
better reputation from Alice's point of view and n_2 "loses" or vice 
versa. Alice does not give away much information since the circuit won't 
be used anyway although, if n_2 was indeed reachable and n_1 cheated, 
Alice maybe shouldn't use n_2 in her circuits for a while. Is this too 
simple?

> As usual, sadly, the reputation system is risky: the adversary can
> manipulate Alice's behavior by feeding her information, and moreover
> he can *recognize* Alice based on the behavior he induced. We must
> limit the number of cases where he can make his target Alice different
> from the other peers. However, even if we have the Morphmix adversary
> (targetting Alice but not allowed to be very subtle about it), we're
> still doing better, because at each step we choose from all the nodes we
> know about, rather than the set offered by step i-1. (You should argue
> with me about this.)
> 
> An adversary who can observe Alice can save some effort by accepting
> relayed connections only while Alice is making circuits. An active
> adversary can also blow away outgoing circuits that don't go to the right
> node, thus forcing Alice onto circuits he owns if she actually wants to
> communicate. That's a problem for Tor and Morphmix also, though.

A very hard problem. One crucial property of any such systems must be 
that no set of nodes can get much more than it's fair share to be picked 
as first hop. Maybe having any kond of reputation scheme is a bad idea 
if we assume acitive attackers? The reputation scheme will always help 
such an attacker: he simply disrupt the service offered by honest nodes 
that are willing to relay to esclude them as relays. But again, a 
10%-of-the-Internet active attacker can on average exclude 10% of the 
honest relays (OK, probably its the 1 - 0.9^2 case again, so its more 
like 20%) which is not terrible. Which brings us back to the discussion 
of a reasonable threat model...

> And finally, it's not just a matter of whether the node accepts
> connections. It's also a matter of providing reasonable-quality
> service for those connections. Time for the requisite p2p totally
> unanalyzable simple idea with complex emergent properties: we could give
> preferential treatment to relays. You give priority to cells from nodes
> that you believe are relays, or in the more complex variant, to relays
> that have given you fast service recently (it would seem you need to
> consider performance rather than just relay-ness, because if you incent
> everybody to be a relay but nobody to perform well, then you're simply
> making it hard to find a working relay). Just as Bittorrent's tit-for-tat
> strategy provides better service for people who are providing you better
> service, we can reward relays by passing their cells before others. Now,
> there are a whole host of reputation-based anonymity attacks here (if
> somebody's getting quick routing through node x, it means node x thinks
> they're a relay, which means node x recently used them for something;
> I can offer you fast service if I want you to go to me for fast service;
> if I can make the other relays look crummy then I'll look even better;
> etc). But on the other hand, incenting nodes to be relays is critical,
> both so the network doesn't collapse under its own weight, and because
> our security parameter above is based on the fraction of relays that
> the adversary owns.

I'm pretty sure that in a practical P2P system, the only way to get more 
than 10% or so non-free riders is to include a mechanism that gives 
better service to the relays and not so much better anonymity. Maybe I'm 
pessimistic, but I don't believe that 50% of all nodes will relay 
traffic to offer "privacy for the world". Relaying means giving away 
parts of your bandwidth (CPU is probably neglectable) and this it what 
counts. Offering better service to relays should compensate for giving 
away bandwidth in first place. Why not using better anonymity as an 
incetive? Beacause unlike Tor, a node in a P2P system usually gets 
relatively little traffic to handle, the traffic is much less "dense" 
than in Tor. A single node has only little traffic to blend in the own 
traffic, which - from a traffic analysis point of view - is porbably 
neglectible. Unless we employ other measures (e.g. cover traffic... not 
that I believe in its practical value) does simply not buy you much 
anonymity in a P2P system. Or am I wrong?

> --Roger

Cheers,
Marc
-- 
Dr. Marc Rennhard                Swiss Federal Institute of Technology
Office: ETZ G61.1          Computer Engineering and Networks Lab (TIK)
Fon: +41 1 632-7005    ETH Zentrum / Gloriastrasse 35 / CH-8092 Zurich
Fax: +41 1 632-1035    PGP-KeyID: 84AEB193, PGP encrypted mail welcome