[tor-dev] Proposal 247: Alternate Path Lengths

Mike Perry mikeperry at torproject.org
Thu Jan 21 17:00:10 UTC 2016


George Kadianakis:
> Mike Perry <mikeperry at torproject.org> writes:
> 
> > George Kadianakis:
> > > I have mixed feelings about this.
> > >
> > > - If client guard discovery is the main reason we are doing this,
> > >   I think we should first look into these guard discovery vectors
> > >   individually and figure out how concerning they are and if there
> > >   is anything else we can do to block them,
> >
> > I agree this is worthwhile, if only to better understand the design
> > space.  However, I think we're going to find that most applications we
> > envision can be induced into violating many of the ad-hoc mitigations
> > we try to bake in.
> 
> OK. Let's see. I feel that these guard discovery attacks can be blocked
> with:
>
> a) If an IP listed on an HS descriptor tells you that it doesn't know
>    the HS, then ignore it for this hidden service today.
> 
> b) If an HSDir that should have an HS descriptor tells you that it
>    doesn't have it, then don't ask it again this hour.
>
> I think we do both checks right now in the Tor codebase and we also
> have caches so that we don't retry the same nodes. If we are serious,
> we could even write those caches on disk.
> 
> I feel that if an application restarts Tor or flushes those caches
> because a hidden service does not work, then the application is doing
> it wrong.

Ok, well consider the browser. All that has to happen for Guard
discovery is a bunch of nested iframes for many different hidden
services, perhaps injected by the exit. We protect against this to some
degree for non-HS traffic by using SOCKS u+p isolation in combination
with keeping circuits open as long as they are used (#15482). But for HS
traffic, new circuits will be built for each new HS address that is
accessed, so we don't have the same ability to limit circuit creation.

For other things, like Ricochet, subtler failure modes can be introduced
to cause circuit churn without repeated hsdir/IP activity, once you
bring the full application layer into scope. Say I'm a
compromised/malicious Ricochet user looking to track down activists,
marginalized folks, whistleblowers, etc. I could rig my Ricochet to fail
the rend circuit periodically, waiting for them to reconnect to me as an
HS client over and over, until my malicious middle was chosen next to
the target's guard. Ricochet (indeed, many P2P protocols) will keep
reconnecting in this case.

Maybe this means that Ricochet made a mistake in using HS circuits in
"full duplex" mode, where the application is agnostic wrt who initiates
the connection, and both sides keep retrying. However, I suspect that
all P2P protocols are going to make this mistake. If we manage to get HS
endpoints working as WebRTC endpoints, then WebRTC calls/connections
will also naturally end up with this problem as well. Probably just
about anything designed for symmetric P2P Internet connections will also
make this mistake.
 
> Also even with client vanguards I think the checks above will still have to be
> implemented. I could imagine an application that flushes all the DataDirectory
> if the hidden service stops working, and then even vanguards won't save them.
> 
> In general, I'm not sure how much sanity we can assume from third-party
> applications.

I think even our own applications are going to surprise us. One of the
things I had to repeatedly argue years ago was "Kill .exit notation:
Path selection must not be capable of being influenced by untrusted
content from the application layer." People whined and cried and whined
and cried when .exit finally vanished from TBB, but it was really necessary
to prevent all sorts of path manipulation+capture attacks.

Any time where the application can be induced into making new paths
through the Tor network, that is vulnerability surface. For some
applications, they actually *must* be allowed to make new circuits based
on untrusted/semitrusted input, so the only thing we can do at the
Tor network layer is to restrict the paths of those circuits to limit
exposure.

My current thinking is that long-term, I still like "virtual circuits"
for client exit traffic
(https://trac.torproject.org/projects/tor/ticket/15458).  Maybe that can
be used for HS clients, too, but it kinda gets messy in that we'll want
to keep re-using HS paths for different HS addrs with the same SOCKS
u+p, which may have other problems. I could be talked into it instead of
client vanguards, though.

> > > before complicating path selection even more.
> >
> > I feel like you're actually going to end up complicating the
> > implementation more with this position. If we have to have separate path
> > selection modes for service side and client side, we then have to
> > maintain three different path selection mechanisms in Tor: normal exit,
> > onion services, and onion clients.
> >
> > If we gave the same options for both hidden services and clients, we are
> > at least down to two systems (exit vs non-exit), with some minor options
> > for each.
> >
> 
> Hmmm maybe. But onion clients would look very much like normal exit, but they
> would connect to RPs/IPs, instead of exits. Just like the code is now.
> 
> Also, with vanguards if we end up doing something like:
> 
>         HSDir: C - L - S - E - HSDir
>                 IP: C - L - S - E - IP
>                         Rend: C - L - M - RP -- S - M - L - HS
> 
> we have three different path types here. We would need to write very beautiful
> interfaces if we want this to be done by the same code.
> 
> > > - Also, I like symmetry myself, but I wouldn't change path selection and
> > >   security just for that _if I can help it_.
> > >
> > > <snip>
> > >
> > > >
> > > >
> > > > Hsdir post/fetch:
> > > >   1. C - L - M - S - E - H
> > > >   2. C - L - S - E - H
> > > >   3. C - L - S - H
> > > >
> > > > Intro:
> > > >   1. C - L - M - S - E -- I   - S - M - L - H
> > > >   2. C - L - S - E     -- I   - S - L - H
> > > >  *3. C - L - S         -- I&S - L - H     (* IP Intersection attack!)
> > > >
> > > > Rend:
> > > >   1. C - L - M - S - R -- E - S - M - L - H
> > > >   2. C - L - S - R     -- E - S - L - H
> > > >   3. C - L - R&S       -- S - L - H
> > > >
> > >
> > > What is R&S is here? Clients use static short-lifespan rendezvous points?
> >
> > Yes. Similarly for I&S (which we should not do - it's bad in every
> > variation of Vanguards).
> >
> > I don't see any such problems with R&S though, since R is not associated
> > with any publicly viewable information, I don't think it is as big of a
> > problem. At best its a linkability risk for the client. But maybe I
> > missed something.
> >
> 
> Hmm, the only problem I can see here is that the R&S can link clients based on
> the L node. So for example, in the crazy edge case where only one client
> conncets to hidden services through R&S over L, then R&S could count "Ah this
> client has done 42 rendezvous through me in the past 5 hours". And if that's a
> ricochet client with 42 contacts maybe it's a selector. But I think this is a
> pretty far fetched example...
> 
> Another _big_ gotcha here is that let's say we end up doing:
> 
>         HSDir: C - L - M - S - E - HSDir
>                 IP: C - L - M - S - E - IP
>                         Rend: C - L - S - RP -- S - M - L - HS
> 
> and all the 'S' nodes are taken from the same pool, then the 'L' node will be
> able to learn 'M' by looking at the IP circuits, and learn 'S' by looking at
> the
> rend circuit. So it will basically be able to derive the full circuit.
> 
> We need to be very careful about which paths we pick, and which "guardsets" we
> get the nodes from.
>
> > > > Looking at these, we can see that we sacrifice the middle guards in the
> > > > second option, which will come at the cost of one less compromise attack
> > > > (but still the need to compromise the long-lived guard). We also lose
> > > > the unlinkability in the third option, and this actually bites us in
> > > > Intro 3: the hidden service L guard can perform a long-term intersection
> > > > attack, watching for published intro points and matching that to the
> > > > circuits that H makes to them. So that path length probably should not
> > > > be used.
> >
> > <snip>
> >
> > > However, I still have mixed feelings about changing client path selection
> > > as
> > > part of proposal 247:
> > >
> > > - My main issue is that I think figuring out the right client path
> > > selection
> > >   will require a _heavy_ amount of security analysis that will delay
> > > prop247
> > >   even more.  I was hoping that we could treat the client-side as an
> > > orthogonal
> > >   problem and tackle it in the future separately. But maybe I'm totally
> > > wrong
> > >   and should be more patient and these two problems should be handled
> > > together.
> >
> > I think patience is best, because if we don't understand this problem
> > really well, we're liable to miss something. Or cement ourselves off
> > from a potential future of interactive HS voice+video. Neither one is a
> > great failure mode.
> >
> 
> Agreed.
> 
> > I think for many applications (esp the browser and ricochet), we're
> > going to find that we need to protect the client just as much as the
> > server.
> >
> > > - If the above changes only happen to HS circuits, we make it harder to
> > > make HS
> > >   circuits indistinguishable from normal circuits on the face of traffic
> > >   analysis. But maybe we have already lost this game.
> >
> > We already lost that game until we have multihop padding. Proposal
> > 247 already outlines how to use it in section 4.1 to help conceal
> > vanguard usage.
> >
> > It is also worth pointing out that if we fail to conceal the HS vanguard
> > fingerprint entirely with padding, it will be especially valuable to
> > have more than just 30k service-side instances with the vanguard
> > fingerprint. Far better to have all the clients in that anonymity set,
> > too, I think.
> >
> 
> Yes that's true. This seems to be the main argument for doing client vanguards
> right now for me.
> 
> However, to actually achieve any sort of confusion here, we need to ensure that
> the paths between clients and HSes are symmetric. So for example if we end up
> doing:
> 
>     C - L - S - E -- IP  - S - M - L - H
> 
> then the L guard could distinguish clients from HSes by looking at whether the
> second hop is short lived ('S') or medium lived ('M').

Ok, I think this, as well as your complexity argument earlier are great
reasons not to mix and match strategy #1 with #2 or #3. If we do provide
security vs latency tradeoff options, I'm now convinced that tradeoff
should be consistent for all paths that an HS uses for all of its
circuits.

If we only offered two security level options, I currently like
HSDir#1+IP#1+Rend#1 for high security and HSDir#2+IP#2+Rend#3 for low
security.

For the low security case, can we think of any reasons to decouple R&S
in Rend#3, or to use Rend#2?


-- 
Mike Perry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160121/9145a1d2/attachment.sig>


More information about the tor-dev mailing list