[anti-censorship-team] Turbo Tunnel: let's build a sequencing/reliability layer into our circumvention protocols

Mon Aug 12 19:58:30 UTC 2019

I've just posted a manifesto on circumvention protocol design at the
net4people BBS:
	https://github.com/net4people/bbs/issues/9

I'll include the text here as well.

Code name: Turbo Tunnel
Designing circumvention protocols for speed, flexibility, and robustness

In working on circumvention protocols, I have repeatedly felt the need
for a piece that is missing from our current designs. This document
summarizes the problems I perceive, and how I propose to solve them.

In short, I think that every circumvention transport should incorporate
some kind of session/reliability protocol—even the ones built on
reliable channels that seemingly don't need it. It solves all kinds of
problems related to performance and robustness. By session/reliability
protocol, I mean something that offers a reliable stream abstraction,
with sequence numbers and retransmissions, like
[QUIC](https://quicwg.org/) or
[SCTP](https://tools.ietf.org/html/rfc4960#section-1.5.2). Instead of a
raw unstructured data stream, the obfuscation layer will carry encoded
datagrams that are provided by the session/reliability layer.

When I say that circumvention transports should incorporate something
like QUIC, for example, I don't mean that QUIC UDP packets are what we
should send on the wire. No—I am not proposing a new *outer* layer, but
an additional *inner* layer. We take the datagrams provided by the
session/reliability layer, and encode them as appropriate for whatever
obfuscation layer we happen to be using. So with meek, for example,
instead of sending an unstructured blob of data in each HTTP
request/response, we would send a handful of QUIC packets, encoded into
the HTTP body. The receiving side would decode the packets and feed them
into a local QUIC engine, which would reassemble them and output the
original stream. A way to think about it is that the the
sequencing/reliability layer is the "TCP" to the obfuscation layer's
"IP". The obfuscation layer just needs to deliver chunks of data, on a
best-effort basis, without getting blocked by a censor. The
sequencing/reliability layer builds a reliable data stream atop that
foundation.

I believe this design can improve existing transports, as well as enable
new transports that are now possible now, such as those built on
unreliable channels. Here is a list of selected problems with existing
or potential transports, and how a sequencing/reliability layer helps
solve them:

Problem: Censors can disrupt obfs4 by terminating long-lived TCP
connections, as Iran did in 2013, killing connections after 60 seconds.
	This problem exists because the obfs4 session is coupled with
	the TCP connection. The obfs4 session begins and ends exactly
	when the TCP connection does. We need an additional layer of
	abstraction, a virtual session that exists independently of any
	particular TCP connection. That way, if a TCP connection is
	terminated, it doesn't destroy all the state of the obfs4
	session—you can open another TCP connection and resume where you
	left off, without needing to re-bootstrap Tor or your VPN or
	whatever was using the channel.
Problem: The performance of meek is limited because it is half-duplex:
it never sends and receives at the same time. This is because, while the
bytes in a single HTTP request arrive in order, the ordering of multiple
simultaneous requests is not guaranteed. Therefore, the client sends a
request, then waits for the server's response before sending another,
resulting in a delay of an RTT between successive sends.
	The session/reliability layer provides sequence numbers and
	reordering. Both sides can send data whenever is convenient, or
	as needed for traffic shaping, and any unordered data will be
	put back in order at the other end. A client could even split
	its traffic over two or more CDNs, with different latency
	characteristics, and know that the server will buffer and
	reorder the encoded packets to correctly recover the data
	stream.
Problem: A Snowflake client can only use one proxy at a time, and that
proxy may be slow or unreliable. Finding a working proxy is slow because
each non-working one must time out in succession before trying another
one.
	The problem exists because even though each WebRTC DataChannel
	is reliable (DataChannel uses SCTP internally), there's no
	ordering between multiple simultaneous DataChannels on separate
	Snowflake proxies. Furthermore, if and when a proxy goes
	offline, we cannot tell whether the last piece of data we sent
	was received by the bridge or not—the SCTP ACK information is
	not accessible to us higher in the stack—so even if we reconnect
	to the bridge through another proxy, we don't know whether we
	need to retransmit the last piece of data or not. All we can do
	is tear down the entire session and start it up again from
	scratch. As in the obfs4 case, this problem is solved by having
	an independent virtual session that persists across transient
	WebRTC sessions. An added bonus is the opportunity to use more
	than one proxy at once, to increase bandwidth or as a hedge
	against one of them disappearing.
Problem: [DNS over HTTPS](https://groups.google.com/d/msg/traffic-obf/ZQohlnIEWM4/09N7zsxjBgAJ)
is an unreliable channel: it is reliable TCP up to the DoH server, but
after that, recursive resolutions are plain old unreliable UDP. And as
with meek, the ordering of simultaneous DNS-over-HTTPS requests is not
guaranteed.
	Solved by retransmission in the session layer. There's no
	[DNS pluggable transport](https://trac.torproject.org/projects/tor/wiki/doc/DnsPluggableTransport)
	yet, but I think some kind of retransmission layer will be a
	requirement for it. Existing DNS tunnel software uses various
	ad-hoc sequencing/retransmission protocols. I think that a
	proper user-space reliability layer is the "right" way to do it.
Problem: Shadowsocks opens a separate encrypted TCP connection for every
connection made to the proxy. If a web page loads resources from 5 third
parties, then the Shadowsocks client makes 5 parallel connections to the
proxy.
	This problem is really about multiplexing, not
	session/reliability, but several candidate session/reliability
	protocols additionally offer multiplexing, for example streams
	in QUIC, streams in SCTP, or smux for KCP. Tor does not have
	this problem, because Tor already is a multiplexing protocol,
	with multiple virtual circuits and streams in one TCP/TLS
	connection. But every system could benefit from adding
	multiplexing at some level. Shadowsocks, for example, could open
	up one long-lived connection, and each new connection to the
	proxy would only open up a new stream inside the long-lived
	connection. And if the long-lived connection were killed, all
	the stream state would still exist at both endpoints and could
	be resumed on a new connection.

As an illustration of what I'm proposing, here's the protocol layering
of meek (which
[sends chunks of the Tor TLS stream](https://trac.torproject.org/projects/tor/wiki/doc/AChildsGardenOfPluggableTransports#meektransportlayer)
inside HTTP bodies), and where the new session/reliability layer would
be inserted. Tor can remain oblivious to what's happening: just as
before it didn't "know" that it was being carried over HTTP, it doesn't
now need to know that it is being carried over QUIC-in-HTTP (for
example).

```
[TLS]
[HTTP]
[session/reliability layer] ⇐ 🆕
[Tor]
[application data]
```

I've done a little survey and identified some suitable candidate
protocols that also seem to have good Go packages:
 * [QUIC](https://quicwg.org/) with [quic-go](https://github.com/lucas-clemente/quic-go)
 * KCP with [kcp-go](https://github.com/xtaci/kcp-go)
 * [SCTP](https://tools.ietf.org/html/rfc4960) with [pion/sctp](https://github.com/pion/sctp)

I plan to evaluate at least these three candidates and develop some
small proofs of concept. The overall goal of my proposal is to liberate
the circumvention context from particular network connections and IP
addresses. 

### Related work

The need for a session and sequencing layer has been felt—and dealt
with—repeatedly in many different projects. It has not yet, I think,
been treated systematically or recognized as a common need. Systems
typically implement some form of TCP-like SEQ and ACK numbers. The ones
that don't, are usually built on the assumption of one long-lived TCP
connection, and therefore are really using the operating system's
sequencing and reliability functions behind the scenes.

Here are are few examples:
 * Code Talker Tunnel (a.k.a. SkypeMorph) uses
   [SEQ and ACK numbers](https://www.cypherpunks.ca/~iang/pubs/skypemorph-ccs.pdf#page=7)
   and mentions selective ACK as a possible extension. I think it uses
   the UDP 4-tuple to distinguish sessions, but I'm not sure.
 * OSS used [SEQ and ACK numbers](https://www.freehaven.net/anonbib/papers/pets2013/paper_29.pdf#page=7)
   and a random ID to distinguish sessions.
 * I [wasted time](https://www.bamsoftware.com/papers/thesis/#p227) in
   the early development of meek grappling with sequencing, before
   [punting by strictly serializing requests](https://www.bamsoftware.com/papers/fronting/#sec:deploy-tor),
   sacrificing performance for simplicity. meek uses an
   [X-Session-Id](https://trac.torproject.org/projects/tor/wiki/doc/AChildsGardenOfPluggableTransports#meektransportlayer)
   HTTP header to distinguish sessions.
 * [DNS tunnels](https://trac.torproject.org/projects/tor/wiki/doc/DnsPluggableTransport/Survey)
   all tend to do their own idiosyncratic thing. dnscat2, one of the
   better-thought-out ones, uses
   [explicit SEQ and ACK numbers](https://github.com/iagox86/dnscat2/blob/master/doc/protocol.md#seqack-numbers).

My position is that SEQ/ACK schemes are subtle enough and independent
enough that they should be treated as a separate layer, not as an
underspecified and undertested component of some specific system.

Psiphon can use
[obfuscated QUIC](https://github.com/Psiphon-Labs/psiphon-tunnel-core/tree/52de11dbabae90bc2ed3e7c5b0e2b8514dc7a988/psiphon/common/quic)
as a transport. It's directly using QUIC UDP on the wire, except that
each UDP datagram is
[additionally obfuscated](https://github.com/Psiphon-Labs/psiphon-tunnel-core/blob/52de11dbabae90bc2ed3e7c5b0e2b8514dc7a988/psiphon/common/quic/obfuscator.go#L48)
before being sent. You can view my proposal as an extension of this
design: instead of always sending QUIC packets as single UDP datagrams,
we allow them to be encoded/encapsulated into a variety of carriers.

[MASQUE](https://davidschinazi.github.io/masque-drafts/draft-schinazi-masque.html#RFC8441)
tunnels over HTTPS and can use QUIC, but is not really an example of the
kind of design I'm talking about. It leverages the multiplexing provided
by HTTP/2 (over TLS/TCP) or HTTP/3 (over QUIC/UDP). In HTTP/2 mode it
does not introduce its own session or reliability layer (instead using
that of the underlying TCP connection); and in HTTP/3 mode it directly
exposes the QUIC packets on the network as UDP datagrams, instead of
encapsulating them as an inner layer. That is, it's using QUIC as a
carrier for HTTP, rather than HTTP as a carrier for QUIC. The main
similarity I spot in the MASQUE draft is the envisioned
[connection migration](https://davidschinazi.github.io/masque-drafts/draft-schinazi-masque.html#rfc.section.7.1)
which frees the circumvention session from specific endpoint IP
addresses.

Mike Perry wrote a
[detailed summary](https://lists.torproject.org/pipermail/tor-dev/2018-March/013026.html)
of considerations for migrating Tor to use end-to-end QUIC between the
client and the exit. What Mike describes is similar to what is proposed
here—especially the subtlety regarding protocol layering. The idea is
not to use QUIC hop-by-hop, replacing the TLS/TCP that is used today,
but to *encapsulate* QUIC packets end-to-end, and use some other
*unreliable* protocol to carry them hop-by-hop between relays. Tor would
not be using QUIC as the network transport, but would use features of
the QUIC protocol.

### Anticipated questions

Q: Why don't VPNs like Wireguard have to worry about all this?
	A: Because they are implemented in kernel space, not user space,
	they are, in effect, using the operating system's own sequencing
	and reliability features. Wireguard just sees IP packets; it's
	the kernel's responsibility to notice, for example, when a TCP
	segment need to be retransmitted, and retransmit it. We should
	do in user space what kernel-space VPNs have been doing all
	along!
Q: You're proposing, in some cases, to run a reliable protocol inside
another reliable protocol (e.g. QUIC-in-obfs4-in-TCP). What about the
reputed inefficiency TCP-in-TCP?
	A: Short answer: don't worry about it. I think it would be
	premature optimization to consider at this point. The fact that
	the need for a session/reliability layer has been felt for so
	long by so many systems indicates that we should start
	experimenting, at least. There's contradictory information
	online as to whether TCP-in-TCP is as bad as they say, and
	anyway there are all kinds of performance settings we may tweak
	if it turns out to be a problem. But again: let's avoid
	premature optimization and not allow imagined obstacles to
	prevent us from trying.
Q: QUIC is not just a reliability protocol; it also has its own
authentication and encryption based on TLS. Do we need that extra
complexity if the underlying protocol (e.g. Tor) is already encrypted
and authenticated independently?
	A: The transport and TLS parts of QUIC are specified separately
	(https://tools.ietf.org/html/draft-ietf-quic-transport
	and https://tools.ietf.org/html/draft-ietf-quic-tls),
	so in principle they are separable and we could just use the
	transport part without encryption or authentication, as if it
	were SCTP or some other plaintext protocol. In practice, quic-go
	[assumes](https://godoc.org/github.com/lucas-clemente/quic-go#Dial)
	right in the API that you'll be using TLS, so separating them
	may be more trouble than it's worth. Let's start with simple
	layering that is clearly correct, and only later start breaking
	abstraction for better performance if we need to.
Q: What about traffic fingerprinting? If you simply obfuscate and send
each QUIC packet as it is produced, you will leak traffic features
through their size and timing, especially when you
[consider retransmissions and ACKs](https://www-users.cs.umn.edu/~hoppernj/ccs13-cya.pdf#page=5).
	A: Don't do that, then. There's no essential reason why the
	traffic pattern of the obfuscation layer needs to match that of
	the sequencing/reliability layer. Part of the job of the
	obfuscation layer is to erase such traffic features, if doing so
	is a security requirement. That implies, at least, *not* simply
	sending each entire QUIC packet as soon as it is produced, but
	padding, splitting, and delaying as appropriate. An ideal
	traffic scheduler would be *independent* of the underlying
	stream—its implementation would
	[not even *know*](https://lists.torproject.org/pipermail/tor-dev/2017-June/012310.html)
	how much actual data is queued to send at any time. But it's
	likely we don't have to be quite that rigorous in
	implementation, at least at this point.