## Background
Pluggable Transports are proxy programs that help users bypass censorship.
[App client] -> XXX EVIL CENSOR HAS YOU XXX ACCESS DENIED XXX [App client] -> [PT client] -> (the cloud!) -> [PT server] -> [App server]
The structural design, on the client side, is roughly:
1. App client specifies an endpoint to reach 2. PT client receives an instruction, via SOCKS, to connect to this endpoint 3. PT client does its thing, magic happens (intentionally vague)
## In Tor
Each endpoint is specified by a Bridge line, in the form of an IP address and an optional fingerprint (for authentication).
This point is not made more important in existing docs, but is important for the topic of this email: both the IP address and the fingerprint are potential *identifiers* of the endpoint. The former is an impure name, the latter a pure name.
Currently, we have two main types of PT:
- direct PTs - connect to the endpoint directly via a TCP connection - these PTs don't try to hide the fact that you're contacting X on addrX. - instead, they usually transform the traffic so it's not identifiable - e.g. obfs3, fte, scramblesuit
- indirect PTs - connect to the endpoint indirectly, via special means - flashproxy - connects via an ephemeral browser proxy - meek - connects via an online web service
I will now argue that indirect PTs should do things in a specific way, which is *not* the way meek and flashproxy currently does things.
## Meek and flashproxy
Meek and flashproxy provide an indirect way of accessing Tor. Instead of connecting directly to a Bridge (which might be blocked), the client connects via a midpoint that is harder to block. Very very roughly,
(meek/fp controller) [meek/fp client] -> [meek/fp midpoint] -> [freedom!]
The Bridge line in the user's torrc is completely ignored, we use a dummy value, like:
Bridge flashproxy 0.0.1.0:1
Instead, it is the controller that decides which endpoint (which Bridge) the midpoint should connect to.
(In meek, the controller is the same entity as the midpoint, but it helps our analysis to consider the two functions separately.)
## The problem
The problem with the above structure, is that it is incompatible with the metaphor of connecting to a specific endpoint. This is what the PT spec is about, even though it does not explicitly mention this viewpoint. Instead, meek and flashproxy provide the metaphor of connecting to a global homogeneous service.
This has positive consequences, such as the user no longer having to bother to find Bridges, but also has several negative consequences:
1. The Tor client can no longer authenticate the endpoint. Although currently Tor makes this optional, it is strongly recommended, to prevent a MitM between the client and the server. Even if the midpoint does this, this is not end-to-end authentication that we would require for strong security.
2. Since the endpoints are not chosen by the user, this may have consequences for anonymity. IANAAR, but this has not yet been looked into.
3. The Tor client (and other applications that use the PT spec) internally use the endpoints metaphor. They may make performance assumptions based on endpoints being configured with different addresses. (Perhaps also security assumptions, although perhaps not due to having to defend against the sybil attack anyway.) Breaking this metaphor is not a good design principle.
4. An application like i2p, where each peer cares much more about *exactly which* endpoint it connects to (e.g. because e2e fingerprint authentication is mandatory), means the metaphor of endpoints even more important. They will not be able to take advantage of these indirect-connection PTs.
5. Chaining a PT that *requires* strong identification (e.g. scramblesuit, for c2s auth) is impossible under this scheme, since the end client cannot select the right server to authenticate against.
## The solution
The solution is simple: the indirect PT client simply has to actually *make use* of the Bridge line, instead of totally dropping this information.
The meek/flashproxy controllers offer service to a finite set of Bridges. [A]
The client should be able to select one of these, specify its fingerprint and any other shared secrets, on their torrc Bridge line, and the indirect-PT will tell the controller to connect *to specifically this Bridge*.
The controller should honour this request. If it doesn't and the fingerprint is specified, it will be caught out by the Tor client.
So instead of having, as currently:
(old, hacky) Bridge flashproxy (dummy addr)
We would have the following cases:
(1) Bridge flashproxy (real addr) (2) Bridge flashproxy (real addr) (fingerprint) (3, not-ideal) Bridge flashproxy (dummy addr) (fingerprint)
Option (3) is quite nice, since in indirect PTs the actual address is irrelevant - the Tor client never tries to connect to it. I suggest that we have a special syntax for it though, to explicitly discourage hacks that {use dummy addresses but which are treated as real addresses by the underlying application}, since this breaks assumptions of the PT spec.
For example,
(3, better) Bridge flashproxy - (fingerprint)
We would add to the PT spec, something like:
"-" is a special hostname syntax in Bridge lines. It means that the address of this Bridge does not concern the underlying application (e.g. Tor), since it will be indirectly reached by the PT client. (If a fingerprint is given, it will still be checked by Tor.)
Using this syntax, we would have conscious application-level awareness for the current behaviour:
(old, hacky) Bridge flashproxy (dummy address) (4) Bridge flashproxy -
for clients that really don't care about the exact endpoint, nor strong e2e authentication. This can be taken by the controller to mean "give me any endpoint, I really don't care".
However, we should distinguish this from the error case:
(5) Bridge flashproxy (real addr, but not in whitelist) (*) (6) Bridge flashproxy (*) (fpr, but not in whitelist)
In these cases, the user is asking for something the controller cannot give them. Instead of falling back to the "give-me-any" behaviour of (4), the PT client should raise an error. (However, returning an error from the controller is not possible with flashproxy's design; it's not clear what the ideal behaviour in this case would be.)
X
Note: I had previously filed a ticket for this, though it is only recently that I realised that it had many more consequences (the topic of this email):
https://trac.torproject.org/projects/tor/ticket/10196
It currently misses out some of the more advanced solutions I presented.
[A] Currently these are finite sets pre-selected by the controller. There is a security issue with simply allowing users to specify *any* IP address for the midpoint to connect to. [1] So whitelisting is the simplest approach, for now. In the future we may think about ways to allow access to "any Tor Bridge", but there are security implications here as well.
[1] https://trac.torproject.org/projects/tor/ticket/10196#comment:1
On 15/04/14 14:03, Ximin Luo wrote:
(3, not-ideal) Bridge flashproxy (dummy addr) (fingerprint)
Option (3) is quite nice, since in indirect PTs the actual address is irrelevant - the Tor client never tries to connect to it. I suggest that we have a special syntax for it though, to explicitly discourage hacks that {use dummy addresses but which are treated as real addresses by the underlying application}, since this breaks assumptions of the PT spec.
For example,
(3, better) Bridge flashproxy - (fingerprint)
We would add to the PT spec, something like:
"-" is a special hostname syntax in Bridge lines. It means that the address of this Bridge does not concern the underlying application (e.g. Tor), since it will be indirectly reached by the PT client. (If a fingerprint is given, it will still be checked by Tor.)
Hmm, for this to work (select the endpoint by fingerprint only), tor will need to pass the fingerprint to the PT client during the SOCKS connection as well. It seems this is not the case from pt-spec.txt:
Example: if the bridge line is "bridge trebuchet www.example.com:3333 09F911029D74E35BD84156C5635688C009F909F9 rocks=20 height=5.6m" AND if the Tor client knows that the 'trebuchet' method is supported, the client should connect to the proxy that provides the 'trebuchet' method, ask it to connect to www.example.com, and provide the string "rocks=20;height=5.6m" as the username, the password, or split across the username and password.
Perhaps we can add the fingerprint to this, as part of Yawning's SOCKS5 extensions.
X
On Tue, Apr 15, 2014 at 02:03:43PM +0100, Ximin Luo wrote:
## The problem
The problem with the above structure, is that it is incompatible with the metaphor of connecting to a specific endpoint. This is what the PT spec is about, even though it does not explicitly mention this viewpoint. Instead, meek and flashproxy provide the metaphor of connecting to a global homogeneous service.
This has positive consequences, such as the user no longer having to bother to find Bridges, but also has several negative consequences:
- The Tor client can no longer authenticate the endpoint. Although
currently Tor makes this optional, it is strongly recommended, to prevent a MitM between the client and the server. Even if the midpoint does this, this is not end-to-end authentication that we would require for strong security.
I see this somewhat differently. You still choose and authenticate the second and third hops. I heard from Roger that it is a sort of accident that bridge-using circuits use three hops, anyway. It should be that there are four: the first hop is your untrusted bridge address you got from wherever, and the second is your guard. Would a design like that make most of these issues go away?
There's an old ticket here, "Let bridge users specify that they don't care if their bridge changes fingerprint." https://trac.torproject.org/projects/tor/ticket/3292 which also ties with this blog post "Different Ways to Use a Bridge." https://blog.torproject.org/blog/different-ways-use-bridge Completion of #3292 would be a beautiful thing, I think, for flash proxy, as it would allow us easily to round-robin multiple websocket bridges. (Currently you can't do that because the tor client freaks out; see https://trac.torproject.org/projects/tor/ticket/7153#comment:5.)
Some other relevant tickets about non-authentication of bridges:
"analyze security tradeoffs from using a socks proxy vs a bridge to reach the Tor network" https://trac.torproject.org/projects/tor/ticket/2764 For "socks proxy", substitute "indirect proxy", and it works the same. I think of indirect proxies like flash proxy as untrusted unauthenticated things that just get you to the Tor network, which you then authenticate, the same as a socks proxy. The quotes there that I agree with are "from a *security* perspective (for a broad definition of security), is there really any difference between a socks proxy and a bridge relay?" and "I don't see any huge roadblocks to having bridges that are just vanilla proxies. We should deploy them if we can make them usable, and maybe someday somebody will show us it was a bad idea."
"Tor build variant to support lightweight socks bridge" https://trac.torproject.org/projects/tor/ticket/3466
David Fifield
On 15/04/14 19:36, David Fifield wrote:
On Tue, Apr 15, 2014 at 02:03:43PM +0100, Ximin Luo wrote:
## The problem
The problem with the above structure, is that it is incompatible with the metaphor of connecting to a specific endpoint. This is what the PT spec is about, even though it does not explicitly mention this viewpoint. Instead, meek and flashproxy provide the metaphor of connecting to a global homogeneous service.
This has positive consequences, such as the user no longer having to bother to find Bridges, but also has several negative consequences:
- The Tor client can no longer authenticate the endpoint. Although
currently Tor makes this optional, it is strongly recommended, to prevent a MitM between the client and the server. Even if the midpoint does this, this is not end-to-end authentication that we would require for strong security.
I see this somewhat differently. You still choose and authenticate the second and third hops. I heard from Roger that it is a sort of accident that bridge-using circuits use three hops, anyway. It should be that there are four: the first hop is your untrusted bridge address you got from wherever, and the second is your guard. Would a design like that make most of these issues go away?
I think this would be OK conceptually, but it would extend the circuit by one hop, to 5 total hops. Currently, we have (with meek/fp):
PT client -> midpoint -> untrusted bridge -> tor relay -> tor exit
My proposed fix would turn it into:
PT client -> midpoint -> trusted bridge -> tor relay -> tor exit
Your suggestion would be analogous to:
PT client -> midpoint -> untrusted bridge -> trusted guard -> tor relay -> tor exit
It also confuses the model a little, since the untrusted bridge does not help toward anonymity (since it can be MitMd), but is still running Tor, solely to bypass censorship.
There's an old ticket here, "Let bridge users specify that they don't care if their bridge changes fingerprint." https://trac.torproject.org/projects/tor/ticket/3292 which also ties with this blog post "Different Ways to Use a Bridge." https://blog.torproject.org/blog/different-ways-use-bridge Completion of #3292 would be a beautiful thing, I think, for flash proxy, as it would allow us easily to round-robin multiple websocket bridges. (Currently you can't do that because the tor client freaks out; see https://trac.torproject.org/projects/tor/ticket/7153#comment:5.)
If by "bridge" you mean "authenticated relay, that is 2 hops before the exit", then I'm not sure how useful round-robin between multiple untrusted bridges really is, since this opens you up to MitM at that point.
"What exactly are we protecting against by refusing to use the network when A's fingerprint changes?" - MitM on A, and relevation of my first-hop OR traffic to the attacker? Am I wrong here? Or is this not a big deal for anonymity?
One can tweak #3292 to prevent MitM - instead of allowing *any* fingerprint, one would be able to specify multiple fingerprints for the same IP address, and the Tor client would treat these as separate Bridges (since they are separate).
I believe this model is clearer and closer "to reality", namely the endpoints metaphor. It's also similar to my (3) suggestion from before.
Some other relevant tickets about non-authentication of bridges:
"analyze security tradeoffs from using a socks proxy vs a bridge to reach the Tor network" https://trac.torproject.org/projects/tor/ticket/2764 For "socks proxy", substitute "indirect proxy", and it works the same. I think of indirect proxies like flash proxy as untrusted unauthenticated things that just get you to the Tor network, which you then authenticate, the same as a socks proxy. The quotes there that I agree with are "from a *security* perspective (for a broad definition of security), is there really any difference between a socks proxy and a bridge relay?" and "I don't see any huge roadblocks to having bridges that are just vanilla proxies. We should deploy them if we can make them usable, and maybe someday somebody will show us it was a bad idea."
"Tor build variant to support lightweight socks bridge" https://trac.torproject.org/projects/tor/ticket/3466
I largely agree with these quotes, but this would be assuming the socks proxy is authenticated (or, can be authenticated) *and* the end-client can completely control the second hop after it. Neither of these properties are true for the indirect proxying of meek/flashproxy.
X
Ximin Luo infinity0@torproject.org writes:
## Background
Pluggable Transports are proxy programs that help users bypass censorship.
[App client] -> XXX EVIL CENSOR HAS YOU XXX ACCESS DENIED XXX [App client] -> [PT client] -> (the cloud!) -> [PT server] -> [App server]
The structural design, on the client side, is roughly:
- App client specifies an endpoint to reach
- PT client receives an instruction, via SOCKS, to connect to this endpoint
- PT client does its thing, magic happens (intentionally vague)
## In Tor
Each endpoint is specified by a Bridge line, in the form of an IP address and an optional fingerprint (for authentication).
This point is not made more important in existing docs, but is important for the topic of this email: both the IP address and the fingerprint are potential *identifiers* of the endpoint. The former is an impure name, the latter a pure name.
Currently, we have two main types of PT:
direct PTs - connect to the endpoint directly via a TCP connection
- these PTs don't try to hide the fact that you're contacting X on addrX.
- instead, they usually transform the traffic so it's not identifiable
- e.g. obfs3, fte, scramblesuit
indirect PTs - connect to the endpoint indirectly, via special means
- flashproxy - connects via an ephemeral browser proxy
- meek - connects via an online web service
I will now argue that indirect PTs should do things in a specific way, which is *not* the way meek and flashproxy currently does things.
## Meek and flashproxy
Meek and flashproxy provide an indirect way of accessing Tor. Instead of connecting directly to a Bridge (which might be blocked), the client connects via a midpoint that is harder to block. Very very roughly,
(meek/fp controller)
[meek/fp client] -> [meek/fp midpoint] -> [freedom!]
<snip>
So instead of having, as currently:
(old, hacky) Bridge flashproxy (dummy addr)
We would have the following cases:
(1) Bridge flashproxy (real addr) (2) Bridge flashproxy (real addr) (fingerprint) (3, not-ideal) Bridge flashproxy (dummy addr) (fingerprint)
Option (3) is quite nice, since in indirect PTs the actual address is irrelevant - the Tor client never tries to connect to it. I suggest that we have a special syntax for it though, to explicitly discourage hacks that {use dummy addresses but which are treated as real addresses by the underlying application}, since this breaks assumptions of the PT spec.
Hm, but this kind of kills the magic of indirect PTs, right? That is, users who want to use flashproxy in the way above, will have to know an address or a fingerprint of the bridge beforehand. What is the use case? Advanced users? I guess most users (people who use the TBB) will still need to use the current scheme, right?
Also, if all traffic goes over the midpoint, how can we make sure that the midpoint will connect us to the bridge requested with:
(1) Bridge flashproxy (real addr)
?
FWIW, I liked your argument with regards to authentication, and David's reply citing a few tickets that detail the (lack of) threat model for Tor bridges...
On 16/04/14 15:56, George Kadianakis wrote:
Ximin Luo infinity0@torproject.org writes:
<snip>
So instead of having, as currently:
(old, hacky) Bridge flashproxy (dummy addr)
We would have the following cases:
(1) Bridge flashproxy (real addr) (2) Bridge flashproxy (real addr) (fingerprint) (3, not-ideal) Bridge flashproxy (dummy addr) (fingerprint)
Option (3) is quite nice, since in indirect PTs the actual address is irrelevant - the Tor client never tries to connect to it. I suggest that we have a special syntax for it though, to explicitly discourage hacks that {use dummy addresses but which are treated as real addresses by the underlying application}, since this breaks assumptions of the PT spec.
Hm, but this kind of kills the magic of indirect PTs, right? That is, users who want to use flashproxy in the way above, will have to know an address or a fingerprint of the bridge beforehand. What is the use case? Advanced users? I guess most users (people who use the TBB) will still need to use the current scheme, right?
We can distribute the fingerprints of the default meek/fp Bridges in the default torrc, just like we distribute non-authenticated defaults currently. If we introduce new ones (e.g. if the old defaults are blocked or need to be shutdown, or just to increase capacity), BridgeDB can distribute these new ones with new fingerprints. (But indirect PTs should be harder to block anyways.)
Also, if all traffic goes over the midpoint, how can we make sure that the midpoint will connect us to the bridge requested with:
(1) Bridge flashproxy (real addr)
?
Yes, this by itself probably doesn't gain that much, I just included it for completeness. (If we imagine a PT that has a non-secret but parameterised obfuscation method (bananaphone?), then we would need this sort of thing if we wanted to use a controller to multiplex between multiple of those Bridges. But ideally everything would have a fingerprint and be strongly authenticated.)
X
FWIW, I liked your argument with regards to authentication, and David's reply citing a few tickets that detail the (lack of) threat model for Tor bridges...
On 16/04/14 16:11, Ximin Luo wrote:
On 16/04/14 15:56, George Kadianakis wrote:
Ximin Luo infinity0@torproject.org writes:
Hm, but this kind of kills the magic of indirect PTs, right? That is, users who want to use flashproxy in the way above, will have to know an address or a fingerprint of the bridge beforehand. What is the use case? Advanced users? I guess most users (people who use the TBB) will still need to use the current scheme, right?
We can distribute the fingerprints of the default meek/fp Bridges in the default torrc, just like we distribute non-authenticated defaults currently. If we introduce new ones (e.g. if the old defaults are blocked or need to be shutdown, or just to increase capacity), BridgeDB can distribute these new ones with new fingerprints. (But indirect PTs should be harder to block anyways.)
I suppose that, with an indirect PT, it is no longer necessary to connect to a Bridge - we should be able to connect, via the midpoint, directly to a normal public entry node. Then its fingerprint would be available in the consensus. Then as you say, the user would not need to bother with fingerprints (or Bridge lines at all), and I can definitely see why this was a strong motivator in the current design of meek/fp.
(This would be similar to the 5-hop separate "untrusted bridges" vs "trusted guard" suggestion in David's post, but cutting out the "untrusted bridge" part.)
So, I'd support an effort to move in this direction as well. However, it would take more changes and more thought than my original proposal, though it's also strictly better than it, I think - i.e. more flexibility, more usability, no less security.
X
On 16/04/14 16:24, Ximin Luo wrote:
On 16/04/14 16:11, Ximin Luo wrote:
On 16/04/14 15:56, George Kadianakis wrote:
Ximin Luo infinity0@torproject.org writes:
Hm, but this kind of kills the magic of indirect PTs, right? That is, users who want to use flashproxy in the way above, will have to know an address or a fingerprint of the bridge beforehand. What is the use case? Advanced users? I guess most users (people who use the TBB) will still need to use the current scheme, right?
We can distribute the fingerprints of the default meek/fp Bridges in the default torrc, just like we distribute non-authenticated defaults currently. If we introduce new ones (e.g. if the old defaults are blocked or need to be shutdown, or just to increase capacity), BridgeDB can distribute these new ones with new fingerprints. (But indirect PTs should be harder to block anyways.)
I suppose that, with an indirect PT, it is no longer necessary to connect to a Bridge - we should be able to connect, via the midpoint, directly to a normal public entry node. Then its fingerprint would be available in the consensus. Then as you say, the user would not need to bother with fingerprints (or Bridge lines at all), and I can definitely see why this was a strong motivator in the current design of meek/fp.
(This would be similar to the 5-hop separate "untrusted bridges" vs "trusted guard" suggestion in David's post, but cutting out the "untrusted bridge" part.)
So, I'd support an effort to move in this direction as well. However, it would take more changes and more thought than my original proposal, though it's also strictly better than it, I think - i.e. more flexibility, more usability, no less security.
Whoops, I was a little hasty here. The above design can already be implemented, as a normal HTTP/SOCKS proxy (that goes through a midpoint), that tor can use via a HTTPProxy/Socks5Proxy torrc option, instead of a ClientTransportPlugin option. The reason why we use Bridges is it gives us a bit more flexibility in terms of the protocol that the entry node can accept, in case the midpoint can't speak OR, which is the case in both meek and fp.
TL;DR: followers of this thread can ignore this and my previous email, sorry for the noise.
As a side point though, I realised another issue for flashproxy. At the moment, the PT client sets the facilitator (the controller) on the command line. This means we can't use Bridges that come from different facilitators. Meek seems to support taking params from the SOCKS args, though. The general point for indirect PTs then, is that information that helps to determine which bridge is used, should be given on the Bridge line.
X
On Wed, Apr 16, 2014 at 04:51:20PM +0100, Ximin Luo wrote:
As a side point though, I realised another issue for flashproxy. At the moment, the PT client sets the facilitator (the controller) on the command line. This means we can't use Bridges that come from different facilitators. Meek seems to support taking params from the SOCKS args, though. The general point for indirect PTs then, is that information that helps to determine which bridge is used, should be given on the Bridge line.
This is a good observation. Flash proxy (I think) predates the ability to pass parameters in the Bridge line. I think it's something we should support though. meek-client can take parameters on a Bridge line, but I had to make it possible to do also through the command line, because the version of tor shipped in the browser bundle doesn't support passing those parameters.
David Fifield