[tor-bugs] #29278 [Circumvention/Pluggable transport]: Assess HTTP proxy

Fri May 17 22:22:57 UTC 2019

#29278: Assess HTTP proxy
-----------------------------------------------+---------------------------
 Reporter:  cohosh                             |          Owner:  phw
     Type:  task                               |         Status:  assigned
 Priority:  Low                                |      Milestone:
Component:  Circumvention/Pluggable transport  |        Version:
 Severity:  Normal                             |     Resolution:
 Keywords:                                     |  Actual Points:
Parent ID:                                     |         Points:  2
 Reviewer:                                     |        Sponsor:  Sponsor19
-----------------------------------------------+---------------------------

Comment (by phw):

 Here are some general thoughts:

 * I quite like the concept. httpsproxy is the closest we've ever gotten to
 a transport that "looks like HTTP". It uses HTTP's CONNECT method
 (conceptually similar to SOCKS), which makes it flexible and low-overhead.
 It also means that anyone who runs a web server could turn on CONNECT
 (and, to prevent abuse, limit outgoing connections to IP addresses of
 guard relays), effectively turning the web server into a snowflake-like
 bridge that doesn't run a Tor client, which conveniently fixes #7349.
 This, however, requires non-trivial changes to BridgeDB as I explain
 below.

 * In my opinion, httpsproxy's biggest problem is that it still suffers
 from the proxy distribution problem. No matter how well httpsproxy can
 disguise Tor traffic, we still end up trying to distribute a small number
 of long-lived bridges while hoping that our adversaries are having a hard
 time collecting them all. We don't know how many of our bridges have been
 collected (#9316 may shed light on this) but it's
 [https://censorbib.nymity.ch/pdf/Matic2017a.pdf certainly easier than we
 would like it to be].

 * I worry that the crowd that can run an httpsproxy bridge may be smaller
 than the crowd that can run an obfs4 bridge. httpsproxy supports two
 deployment scenarios;
 "[https://trac.torproject.org/projects/tor/ticket/26923#Naiveproxy naive
 proxy]" and
 "[https://trac.torproject.org/projects/tor/ticket/26923#FullBridge full
 bridge]". The "naive proxy" scenario is similar to snowflake and expects
 you to already be running a web server. We may have many motivated
 volunteers, but I'm afraid that only a small fraction runs their own web
 server. This is not necessary in the "full bridge" scenario, but this
 comes at the cost of being less resistant to fingerprinting. In
 comparison, snowflake's barrier to entry is significantly lower—especially
 once we have a web extension (#23888).

 Here are my thoughts on what deployment would entail:

 * httpsproxy is written in golang. It's not a lot of code (the HTTP logic
 comes from the [https://github.com/mholt/caddy caddy module]) and the
 concept behind it is relatively simple, meaning that we would be able to
 maintain it even if the original author would vanish.

 * The "naive proxy" deployment scenario won't work with our bridge
 authority and BridgeDB because they assume that a tor client and its
 pluggable transport run on the same machine. To make the "naive proxy"
 scneario work, we would probably have to come up with a new channel that
 allows tor-less httpsproxies to announce themselves to BridgeDB. Since
 this is similar to the way snowflake works, a snowflake-style broker
 mechanism may come in handy here but unlike snowflake, httpsproxy is
 affected by the bridge distribution problem, so the broker would need to
 get some of the smartness that BridgeDB already has (see #29296).

 * Alternatively (or in parallel), we can deploy httpsproxy in the orthodox
 "full bridge" scenario, which is similar to obfs4. In this case, a tor
 client ships with a web server (currently [https://caddyserver.com
 caddy]). This will work out of the box with our bridge authority and
 BridgeDB, but we will have a number of additional issues:
   1. Bridges will expose a web server ''and'' an OR port. Because of
 #7349, this will enable confirmation attacks à la "Not sure if this web
 server runs a Tor bridge? Just port scan it and look for an OR port". This
 isn't a new problem but it somewhat defeats the purpose of shipping a
 well-designed pluggable transport.
   2. All bridges will run the same web server and if this web server isn't
 particularly popular on the Internet, censors could fingerprint and block
 them all. I don't know how popular caddy is, but I've never heard of it
 before I started learning about httpsproxy.
   3. The content hosted on the bridge's web server needs to look
 "natural". A web server that gives you a simple 404 or 403 for its landing
 page may look suspicious. Or maybe not? I don't think we can expect our
 bridge operators to be creative, and serve "natural" content on their
 httpsproxy web servers.

   The benefit of this scenario is that it doesn't require architectural
 changes to BridgeDB. In fact, we could move forward with deploying the
 "full bridge" scneario and start supporting the "naive proxy" approach
 later on.

 * There are a bunch of fingerprinting issues that we would have to think
 about. Sergey, the author of httpsproxy, already did a great job
 discussing them
 [https://trac.torproject.org/projects/tor/ticket/26923#Fingerprinting over
 here]. I'm particularly worried about
 [https://trac.torproject.org/projects/tor/ticket/26923#Probingwebserverwithproxyrequestswithoutasecret
 an active probing attack] that allows a censor to confirm if a web server
 supports CONNECT.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29278#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online