[tor-dev] Building better pluggable transports (Google Summer of Code)
Steven.Murdoch at cl.cam.ac.uk
Tue May 28 14:59:15 UTC 2013
We've been discussing how to build better pluggable transports for Tor as part of your application to Google Summer of Code. Now that you've been accepted, I thought it would be good to bring this discussion to tor-dev so that others can contribute.
The basic idea behind the project is to build a pluggable transport which is as unlikely has possible to be blocked. In particular, we are most interested in improving the situation in countries where obfs2/obfs3 is blocked or likely to be blocked in the near future.
There are two basic options, which would ideally be combined:
- Better camouflaging Tor traffic
- Scanning resistance
On better camouflaging Tor traffic, the benchmark we have is obfs3, which converts Tor traffic to data which is indistinguishable from random bytes (but timing and packet-size patterns are not disguised). As far as I'm aware, this is not being blocked anywhere but it may be possible to block based on the fact that there's not much truly random data on the Internet. Also, obfs3 will not get through a HTTP proxy, as it is clearly not HTTP.
So one option for the project is to impersonate HTTP. This is deceptively difficult because although HTTP is transmitted over TCP, the properties it offers to higher layers are not as strong as TCP (and not as required by Tor). For instance, individual HTTP requests may be re-ordered if they are over different TCP connections. Also, responses may be truncated without an error being reported to higher layers (which is why HTTP includes length fields as an option). HTTP doesn't give the same congestion avoidance as TCP and proxies can both cache and modify data they transmit. The HTTP specification is vague on some topics, and even when it specifies a particular behaviour, proxies frequently violate the specification.
On the up-side of a HTTP proxy, HTTP is probably one of the last protocols a country will block before they turn off the Internet completely, so it has a good chance of getting through. Also, in some scenarios the only way for traffic to get out is via a HTTP proxy. So I think there is significant usefulness in this option. Also, it is incredibly difficult to hide one protocol inside a different one, because just recording the approximate number of bytes sent vs bytes received can give a good recognition of the protocol . Hiding HTTP-over-Tor as BitTorrent traffic will likely be detectable based on such a statistic, but HTTP-over-Tor as HTTP at least has a chance.
On the side of scanning resistance, we discussed the challenge of implementing scanning resistance with TCP. Here, the problem is that someone sending a SYN packet to a port which is open will receive a SYN-ACK, regardless of what user code does. To resist scanning (with something like BridgeSPA ) the pluggable transport would need to be quite tightly integrated with the OS rather than just using the standard socket API. Therefore it will create deployment difficulties, especially on Windows which has locked-down the raw sockets API.
Therefore it might be interesting to send data over UDP rather than TCP, as then it is the responsibility of the user code to send the SYN-ACK-equivalent. Tor needs properties similar to TCP from it's pluggable transport, and so any UDP-pluggable transport would need something which replaces TCP: reliable in-order delivery with congestion management. One option here is libutp, as used by BitTorrent . There is a vast amount of libutp traffic on the Internet, but it's timing and upstream/downstream characteristics will be different from how Tor would use it. Alternatively, it might not be worth worrying about this type of scanning resistance, and just focus on what it is possible to do with TCP, as done with ScrambleSuite .
Chang, George, do you have anything to add to this summary? Does anyone else on tor-dev have thoughts on these topics?
More information about the tor-dev