[tor-dev] Improving the structure of indirect-connection PTs (meek/flashproxy)

Tue Apr 15 13:03:43 UTC 2014

## Background

Pluggable Transports are proxy programs that help users bypass censorship.

[App client] -> XXX EVIL CENSOR HAS YOU XXX ACCESS DENIED XXX
[App client] -> [PT client] -> (the cloud!) -> [PT server] -> [App server]

The structural design, on the client side, is roughly:

1. App client specifies an endpoint to reach
2. PT client receives an instruction, via SOCKS, to connect to this endpoint
3. PT client does its thing, magic happens (intentionally vague)

## In Tor

Each endpoint is specified by a Bridge line, in the form of an IP address and an
optional fingerprint (for authentication).

This point is not made more important in existing docs, but is important for
the topic of this email: both the IP address and the fingerprint are potential
*identifiers* of the endpoint. The former is an impure name, the latter a pure
name.

Currently, we have two main types of PT:

- direct PTs - connect to the endpoint directly via a TCP connection
  - these PTs don't try to hide the fact that you're contacting X on addrX.
  - instead, they usually transform the traffic so it's not identifiable
  - e.g. obfs3, fte, scramblesuit

- indirect PTs - connect to the endpoint indirectly, via special means
  - flashproxy - connects via an ephemeral browser proxy
  - meek - connects via an online web service

I will now argue that indirect PTs should do things in a specific way, which is
*not* the way meek and flashproxy currently does things.

## Meek and flashproxy

Meek and flashproxy provide an indirect way of accessing Tor. Instead of
connecting directly to a Bridge (which might be blocked), the client connects via a
midpoint that is harder to block. Very very roughly,

                    (meek/fp controller)
[meek/fp client] -> [meek/fp midpoint] -> [freedom!]

The Bridge line in the user's torrc is completely ignored, we use a dummy
value, like:

Bridge flashproxy 0.0.1.0:1

Instead, it is the controller that decides which endpoint (which Bridge) the
midpoint should connect to.

(In meek, the controller is the same entity as the midpoint, but it helps our
analysis to consider the two functions separately.)

## The problem

The problem with the above structure, is that it is incompatible with the
metaphor of connecting to a specific endpoint. This is what the PT spec is
about, even though it does not explicitly mention this viewpoint. Instead,
meek and flashproxy provide the metaphor of connecting to a global
homogeneous service.

This has positive consequences, such as the user no longer having to bother
to find Bridges, but also has several negative consequences:

1. The Tor client can no longer authenticate the endpoint. Although
currently Tor makes this optional, it is strongly recommended, to prevent a
MitM between the client and the server. Even if the midpoint does this, this
is not end-to-end authentication that we would require for strong security.

2. Since the endpoints are not chosen by the user, this may have
consequences for anonymity. IANAAR, but this has not yet been looked into.

3. The Tor client (and other applications that use the PT spec) internally use
the endpoints metaphor. They may make performance assumptions based on endpoints
being configured with different addresses. (Perhaps also security assumptions,
although perhaps not due to having to defend against the sybil attack anyway.)
Breaking this metaphor is not a good design principle.

4. An application like i2p, where each peer cares much more about *exactly
which* endpoint it connects to (e.g. because e2e fingerprint authentication
is mandatory), means the metaphor of endpoints even more important. They
will not be able to take advantage of these indirect-connection PTs.

5. Chaining a PT that *requires* strong identification (e.g. scramblesuit,
for c2s auth) is impossible under this scheme, since the end client cannot
select the right server to authenticate against.

## The solution

The solution is simple: the indirect PT client simply has to actually *make
use* of the Bridge line, instead of totally dropping this information.

The meek/flashproxy controllers offer service to a finite set of Bridges. [A]

The client should be able to select one of these, specify its fingerprint
and any other shared secrets, on their torrc Bridge line, and the
indirect-PT will tell the controller to connect *to specifically this Bridge*.

The controller should honour this request. If it doesn't and the fingerprint
is specified, it will be caught out by the Tor client.

So instead of having, as currently:

(old, hacky) Bridge flashproxy (dummy addr)

We would have the following cases:

(1) Bridge flashproxy (real addr)
(2) Bridge flashproxy (real addr) (fingerprint)
(3, not-ideal) Bridge flashproxy (dummy addr) (fingerprint)

Option (3) is quite nice, since in indirect PTs the actual address is
irrelevant - the Tor client never tries to connect to it. I suggest that we
have a special syntax for it though, to explicitly discourage hacks that {use
dummy addresses but which are treated as real addresses by the underlying
application}, since this breaks assumptions of the PT spec.

For example,

(3, better) Bridge flashproxy - (fingerprint)

We would add to the PT spec, something like:

  "-" is a special hostname syntax in Bridge lines. It means that the
  address of this Bridge does not concern the underlying application (e.g.
  Tor), since it will be indirectly reached by the PT client. (If a
  fingerprint is given, it will still be checked by Tor.)

Using this syntax, we would have conscious application-level awareness for
the current behaviour:

(old, hacky) Bridge flashproxy (dummy address)
(4) Bridge flashproxy -

for clients that really don't care about the exact endpoint, nor strong
e2e authentication. This can be taken by the controller to mean "give me any
endpoint, I really don't care".

However, we should distinguish this from the error case:

(5) Bridge flashproxy (real addr, but not in whitelist) (*)
(6) Bridge flashproxy (*) (fpr, but not in whitelist)

In these cases, the user is asking for something the controller cannot give
them. Instead of falling back to the "give-me-any" behaviour of (4), the PT
client should raise an error. (However, returning an error from the
controller is not possible with flashproxy's design; it's not clear what the
ideal behaviour in this case would be.)

X

Note: I had previously filed a ticket for this, though it is only recently that
I realised that it had many more consequences (the topic of this email):

https://trac.torproject.org/projects/tor/ticket/10196

It currently misses out some of the more advanced solutions I presented.

[A] Currently these are finite sets pre-selected by the controller. There is a
security issue with simply allowing users to specify *any* IP address for the
midpoint to connect to. [1] So whitelisting is the simplest approach, for now.
In the future we may think about ways to allow access to "any Tor Bridge", but
there are security implications here as well.

[1] https://trac.torproject.org/projects/tor/ticket/10196#comment:1

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 880 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20140415/ca75c399/attachment.sig>