[tor-dev] Proposal: The move to two guard nodes

Mike Perry mikeperry at torproject.org
Wed Apr 11 11:15:44 UTC 2018


Roger Dingledine:
> On Sat, Mar 31, 2018 at 06:52:51AM +0000, Mike Perry wrote:
> > 3.1. Eliminate path restrictions entirely
> > 
> I'm increasingly a fan of this option, the more I read these threads.
> 
> Let's examine the two attacker assumptions behind two of the attacks
> we're worried about.
> 
> Attack one: the client's local ISP collects coarse netflow logs, and these
> logs aren't detailed enough to allow a traffic volume detection attack on
> an existing long-lived TLS flow, so the connection to that first guard
> is safe; but a connection to that second guard will be unusual and not
> multiplexed and at exactly the time of the adversary-controlled circuit
> that triggered it, so that second guard, because it is used so rarely,
> is dangerous to use.
> 
> Attack two: if the client uses its guard as the first hop of its circuit
> and also the adversary-requested fourth hop, then the guard can do
> pairwise traffic correlation attacks on all of its circuits and realize
> that these two circuits it has are really two pieces of the same circuit.
> 
> This second attack seems weird to me. One reason is because in attack
> one we're brushing aside the traffic analysis as hard, whereas in attack
> two we're assuming it's trivial and perfect. But the simpler reason is:
> if your guard is going to participate in a traffic correlation attack
> against you, then it could just as easily team up with some other relay
> that the adversary picked. That is, avoiding reusing your guard on the
> other end of the circuit isn't going to save you if your guard is out
> to get you.

I agree. I am not concerned about attack two. But we're not choosing
between just these two attacks.
 
> To be clear, the design I've been considering here is simply allowing
> reuse between the guard hop and the final hop, when it can't be avoided. I
> don't mean to allow the guard (or its family) to show up as all four
> hops in the path. Is that the same as what you meant, or did you mean
> something more thorough?

By all path restrictions I mean for the last hop of the circuit and the
first (though vanguards would be simpler if we got rid of them for other
hops, too). But I do mean all restrictions, not just guard node choice.
The adversary also gets to force you to use a second network path
whenever they want via the /16 and node family restrictions. And it
happens naturally all the time.

We're not using one guard in the current Tor. We're using two, and the
second one is only used for unmultiplexed activity. That is one property
I don't like about our "let's pretend to use one guard" status quo.

The second thing I don't like is that one guard is fragile, which
enables confirmation attacks when it can be made to go down.

> I think "can't be avoided" means HSDir, IP, RP -- which I note are all
> onion service related circuits.
> 
> I'd like to hear more about the "cleverly crafted exit policy" attack, and
> I wonder if we can't solve that differently. For example, if it's about
> making you do a request to a port that only one exit relay allows, and
> ha ha whoops your guard was on the same /16 as that exit relay... maybe
> it's time for the dir auths to not advertise super rare ports? This was
> one of the topics in the users-get-routed paper too.

Yes that is the one I was talking about.

However, another way to do this type of exit rotation attack is to cause
a client to look up a DNS name where you control the resolver, and keep
timing out on the DNS response. The client will then retry the stream
request with a new exit. The same thing can also be done by timing out
the TCP handshake to a server you control. Both of these attacks can be
done with only the ability to inject an img tag into a page.

You repeat this until an exit is chosen that is in the same /16 or
family as the guard, and then the client uses a second network path for
an unmultiplexed request at a time you control.

> One non-starter idea would be to move onion-service-related Tors to two
> guards, and leave other Tors at one guard. It's a non-starter because of
> course advertising which you are to your local network is no good. But
> that idea gave me a different perspective on this discussion: I wonder
> how much this design decision comes down to making all Tors use two
> guards in order to protect the onion-service-related Tors, which are
> the only ones who actually need it?

Our path restrictions also cause normal exiting clients to use a second
guard for unmultiplexed activity, at adversary controlled times, or just
at periodically at random.
 
> >   However, while removing path restrictions will solve the immediate
> >   problem, it will not address other instances where Tor temporarily opts
> >   use a second guard due to congestion, OOM, or failure of its primary
> >   guard, and we're still running into bugs where this can be adversarially
> >   controlled or just happen randomly[5].
> 
> I continue to think we need to fix these. I'm glad to see that George
> has been putting some energy into looking more at them. The bugs that
> we don't understand are especially worrying, since it's hard to know
> how bad they are. Moving to two guards might put a bit of a bandaid on
> the issues, but it can't be our long-term plan for fixing them.

We're choosing fixes for these bugs that enable an adversary to deny
service to clients at a particular guard, *without* letting those
clients move to a second guard. This enables confirmation attacks, and
these confirmation attacks can be extended to guard discovery attacks by
DoSing guards one at a time until an onion service fails.

Bringing back CREATE_FAST could help with this piece, I suppose, but it
doesn't solve OOM attacks...

> >   Note that for this analysis to hold, we have to ensure that nodes that
> >   are at RESOURCELIMIT or otherwise temporarily unresponsive do not cause
> >   us to consider other primary guards beyond than the two we have chosen.
> >   This is accomplished by setting guard-n-primary-guards to 2 (in addition
> >   to setting guard-n-primary-guards-to-use to 2). With this parameter
> >   set, the proposal 271 algorithm will avoid considering more than our two
> >   guards, unless *both* are down at once.
> 
> I like this general idea of not immediately replacing guards so long as
> you have a working one. In fact, we used to do something similar back
> in the day:
> https://blog.torproject.org/improving-tors-anonymity-changing-guard-parameters
> says (emphasis mine)
> """
> Tor 0.2.3's entry guard behavior is "choose three guards, ***adding
> another one if two of those three go down*** but going back to the
> original ones if they come back up, and also throw out (aka rotate)
> a guard 4-8 weeks after you chose it."
> """
> 
> There are still some fiddly decisions to make here. For example, as you
> say we probably shouldn't replacement a guard just because we failed to
> connect to one of our guards once. We might decide that it's time to add
> a new second guard if the consensus tells us that one of them is down
> (so we have confirmation that it isn't down for just us, it's down for
> everybody). Or we might decide to wait on adding a new one even if it
> really is down, because maybe it'll come back soon. But how long do
> we wait? And if, while we're down to one, we encounter one of these
> situations where the requested fourth hop overlaps with our remaining
> guard, what do we do?

If I were to drop everything to build the Tor I think should exist, I
would do the following:

1. Use two guards, replacing them only when both are unreachable, or
   when one leaves the consensus.
2. Make path restrictions not as strict (for cases like the one above).
3. Use conflux (which also needs less strict/no path restrictions)
4. Build it on QUIC.

I would do them in that order because I think we get the most benefit
from #1, and we get some benefit from #2 still (as you point out above). 

You keep focusing on the performance aspects of conflux, but that is not
the argument I am making. My arguments for conflux in Section 4 are
about resilience to congestion, downtime, circuit killing, and DoS, as
well as traffic analysis resistance. I see the performance benefits as
secondary. 

(I also think the best arguments for QUIC are also in the reliability
direction, because fixed queues means no adversary provoked OOMing.)

> In fact, here's a hopefully useful insight that I've just realized:
> you're not concerned about one guard vs two guards, you're concerned
> about *transitioning* between guards. It's that moment when you're
> starting to use a new guard, if the attacker can observe that you're
> doing it, and especially if the attacker can make you do it, that is
> vulnerable. And starting with two guards can help, in that it postpones
> the time until you're forced to transition, and maybe also because if
> we do it right it can make the transition less visible.

The transition aspect is a big piece of it, but I think we're also
running into a fragility problem, which makes the transition signal very
loud in many cases.

> But I wonder if we're looking at this backwards, and the primary
> question we should be asking is "How can we protect the transition between
> guards?" Then one of the potential answers to consider is "Maybe we should
> start out with two guards rather than just one." Framing it that way,
> are there more options that we should consider too? For example, removing
> the ability of the non-local attacker to trigger a transition? Then
> there would still be visibility of a transition, but the (non-local)
> attacker can't impact the timing of the transition. How much does that
> solve? Need to think more.

One guard is inherently more fragile than two, and no matter what we do,
it means that there will be a risk of attacks that can confirm guard
choice, because the downtime during this transition can never be hidden
without at least some redundancy.

> In summary:
> 
> (1) I think we should fix the bug from #14917 where the attacker can
> push us off our guard just by naming our guard as the HSDir/IP/RP,
> and I think we should fix it by being willing to reuse our guard when
> it can't be avoided. That step will resolve some, but not all, of the
> pressure about moving to two guards. Then

Without removing all path restrictions that apply to first and last hop,
we're still actually using two guards, and using them at times that the
adversary gets to control if they want, or just randomly otherwise.

> (2) Hopefully the above discussion has helped us move forward on the
> remaining reasons for switching to two guards. To me the two biggest
> questions left to resolve are (a) how best to protect the vulnerable
> transition to a new guard, and if two guards is the best idea we've got
> for that, and (b) how big an issue is it really that having only one
> guard can sometimes give you a low-performance guard, and if two guards
> is the best idea we've got for that one too.

Transitions will always be noisy with one guard, because it is fragile
to DoS, congestion, OOM, circuit failure, onionskin overload, etc etc
etc. How can you provide resiliency under arbitrary and partial failure
without any redundancy?



-- 
Mike Perry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20180411/ffc53075/attachment.sig>


More information about the tor-dev mailing list