In Rome, I held a session about network protocol upgrades. My intent was to cover the switch to two guards, conflux, datagram transports, and QUIC. We ended up touching only briefly on everything but QUIC, but we went into enough depth on QUIC itself that it was a worthwhile and very productive session.
Our notes are here: https://trac.torproject.org/projects/tor/wiki/org/meetings/2018Rome/Notes/Fu...
I wanted to give a bit of background and some historical perspective about datagram transports for Tor, as well as explain those notes in long form, to get everybody on the same page about this idea. With the forthcoming IETF standard ratification of QUIC along with several solid reference implementations (canonical list: https://github.com/quicwg/base-drafts/wiki/Implementations), I believe we are close to the point where we can finally put together a plan (and a funding proposal) to undertake this work.
Consider this mail a pre-proposal to temperature check and solicit early feedback about a Tor-over-QUIC deployment, before we invest the effort to deep dive into the framing, protocol, and implementation details that will be necessary for a full proposal.
# Historical Background
So, first: Tor has been considering UDP transports for a long time. There have been a series of research papers on the topic, dating back to Proposal 100 -- the very first Tor proposal: https://gitweb.torproject.org/torspec.git/tree/proposals/100-tor-spec-udp.tx...
That proposal only examines adding an unreliable DTLS link between relays, to enable transmission of unreliable (UDP) traffic end-to-end. Despite being marked as Dead, it is a good conceptual starting point, because it examines the relay-to-relay link crypto properties we need from a datagram layer and has some good comments at the bottom.
However, adding the ability to simply carry UDP is not sufficient. What we have always really wanted is to transition the network to some form of end-to-end drop-based reliable congestion and flow control. There are numerous benefits from such a transition.
Drop-signaled end-to-end congestion control means that instead of having to maintain arbitrary queues at relays (to make room for Tor's fixed SENDME window sizes for every single Tor circuit -- which as we have seen does not scale and is vulnerabe to DoS and OOM attacks), we can instead use a global fixed-length queue at each relay, and have each relay simply drop packets when this queue is full. This behavior mirrors how the internet works, and will not only put an upper bound on the end-to-end queuing latency of the Tor network, but it will also allow us to better ensure fairness, eliminate OOM conditions, and mitigate DoS and congestion attacks. QUIC's stream management and byte-acked teardown mechanisms will also eliminate a class of sidechannel issues in Tor where an adversary gets to send ignored data on invalid and/or recently closed streams: https://trac.torproject.org/projects/tor/ticket/25573
Furthermore, the switch to a secure datagram link protocol will eliminate RST spamming attacks, where an adversary spoofs RSTs to/from relay IP addresses to tear down our TLS links to gain information from the resulting circuit failures and retries across the network.
Prior to the advent of QUIC, we studied many options for achieving end-to-end congestion control, but have found all of them to be lacking in one way or another. In 2011, Steven Murdoch did a literature review of these approaches: https://research.torproject.org/techreports/datagram-comparison-2011-11-07.p...
Section 4 of that review is the most useful thing to skim -- it explains why we found each of the end-to-end transport modes available at the time to be either insufficient for Tor's use, or encumbered by licensing or engineering incompatibilities.
# QUIC Overview
QUIC, however, is different. It basically serves as a complete drop-in replacement for Tor's n:1 streams-on-circuit connection model. QUIC has a connection layer, which corresponds to Tor's circuit concept, and a framing layer, which supports multiplexing for multiple reliable streams inside a single connection (corresponding to Tor's streams).
Each QUIC connection provides an outer layer of drop-based congestion control. Unlike latency-based congestion control like bittorrent's utp, which was designed to compensate for poor queuing management at edge routers (as well as yield to TCP flows), QUIC's drop-based congestion control is designed to find the common bottleneck anywhere on the connection's path, and use packet drops from that queue to signal congestion and ensure fairness among connections at that bottleneck point. QUIC also provides optimizations to handle higher latency and drop-heavy links (SACK, SACK ranges, early retransmit, reordering tolerance, RTT estimates): https://quicwg.github.io/base-drafts/draft-ietf-quic-recovery.html
Additionally, QUIC provides flow control at the connection level, as well as at the individual stream level. This is analogous to Tor's flow control model, but unlike Tor, QUIC implementations can adjust these window sizes automatically based on RTT measurements and congestion feedback: https://datatracker.ietf.org/doc/html/draft-ietf-quic-transport#section-3.4 https://datatracker.ietf.org/doc/html/draft-ietf-quic-transport#section-11 https://docs.google.com/document/d/1F2YfdDXKpy20WVKJueEf4abn_LVZHhMUMS5gX6Pg...
It is perhaps no small coincidence that QUIC's architecture fits so perfectly into Tor's circuit model. While at an IETF meeting years ago, an attendee quietly whispered in my ear, "You know, we're designing IETF QUIC with Tor-like networks in mind." I always love me a good prospiracy (https://www.urbandictionary.com/define.php?term=prospiracy ;).
# Using QUIC in Tor
While QUIC does indeed solve a great many problems for Tor out of the box, the path to Tor-over-QUIC requires that we make a few design and engineering decisions specific to Tor.
There are two research implementations that study using Tor-over-QUIC: Quux and QuTor. Quux was implemented by wrapping Chromium's C++ QUIC implementation in C bindings: https://www.benthamsgaze.org/2016/09/29/quux-a-quic-un-multiplexing-of-the-t... https://www.benthamsgaze.org/wp-content/uploads/2016/09/393617_Alastair_Clar...
QuTor is unfortunately paywalled at researchgate, so it is hard for me to tell what they did: https://www.researchgate.net/profile/Raik_Aissaoui/publication/292392094_QUT...
Unfortunately, the Quux implementation appears to use QUIC at a suboptimal position -- they replace Tor's TLS connections with QUIC, and use QUIC streams in place of Tor's circuit ID -- but only between relays. This means that it does not benefit from QUIC's end-to-end congestion control for the entire path of the circuit. Such a design will not solve the queuing and associated OOM problems at Tor relays, since relays would be unable to drop cells to signal backpressure to endpoints. Drops will instead block every circuit on a connection between relays, and even then, due to end-to-end reliability, relays will still need to queue without bound, subject to Tor's current (and inadequate) flow control.
The optimal design is to use QUIC end-to-end. (Based on the researchgate abstract, it looks like this might be what QuTor did, but it is hard to tell.) Instead of replacing TLS with QUIC, we should use an unreliable link encryption protocol between relays (such as DTLS), and replace Tor's circuits with QUIC connections. In this design, Tor applications connect to the local SOCKS port using TCP, and the tor client converts it to a QUIC stream on a QUIC connection-circuit that runs all the way to the exit or onion service, which then will convert the stream back to TCP. The exit/service keeps buffering to a minimum by using QUIC pushback on the Tor side, and TCP pushback on the server side.
While IETF QUIC provides its own crypto layer based on TLS 1.3, we won't really need this encryption. Instead, we will layer QUIC inside of Tor's layered onion encryption, with an ntor handshake to each hop in the path.
# QUIC Deployment Considerations
Past the above high-level decision of how to use QUIC, a handful of deployment considerations remain.
## Implementation
As mentioned previously, there are a growing number of QUIC implementations out there, in several languages, with varying degrees of IETF compatibility: https://github.com/quicwg/base-drafts/wiki/Implementations
For our purposes, the most interesting ones are ngtcp2 (native C implementation, BoringSSL, MIT licensed), picoquic (C, MIT license), and mozquic (C++ with C bindings, NSS, MPL licensed). The QUIC in Chromium is not IETF compliant, and is C++ with no C bindings. Sadly Rust implementations of QUIC are not complete. Google does also maintain a Go QUIC, which is complete. This suggests that experimental designs using one of the golang Tor implementations might be worthwhile.
I'm told that the IETF hopes to finalize the QUIC standard within the year.
## Framing
QUIC actually differentiates between connection-level packet sizes and frame sizes. If we want to preserve Tor's wire behavior, we will want to enforce a one-frame-per-packet policy from our QUIC library, and use QUIC's padding frames to make packet sizes uniform.
## Upgrade Path
The upgrade path to QUIC will require some thorough thinking -- if all relays on a path don't yet support datagram links, should we embed QUIC inside an opaque RELAY cell type, or should wait for a flag day to switch over once sufficient relays have upgraded? Embedding QUIC in opaque RELAY cells seems like the natural choice, but will require additional engineering considerations.
Additionally, a small and declining percentage of networks currently block all UDP other than DNS. At last measurement, the rate of QUIC failure was below 5%. However, like Google, we will likely want to retain the ability to fall back to TLS over TCP for these situations.
## Signaling
Because we need to transition to unreliable relay links to get the most out of QUIC, we will need to figure out how to handle circuit setup -- there is currently no mechanism to ensure reliability for onionskins in Tor, since this is provided by TLS today. The simplest solution is to retain TLS between relays to transmit circuit extend cells out-of-band. Padding could be used to obscure timing information of this signaling.
The alternative would be to implement some onionskin-level acking between relays, so that the datagram transport could be used for all traffic. However, since our deployment path will likely involve migration from TCP anyway, and since a small (and shrinking) percentage of networks simply block all non-DNS traffic, we will likely want to retain TCP connections for some time, making out-of-band signaling a natural first choice.
## Native UDP Support
Unfortunately, during the standardization process, the IETF QUIC decided to drop support for unreliable datagram streams inside the QUIC connection, to simplify the layering of their implementation. However, because we will still need to layer QUIC inside of an onion layer, we can use that layer to carry either QUIC or UDP (at the expense of an additional field to differentiate this).
# QUIC Research Questions
A couple areas of open research into how QUIC will behave in a Tor deployment remain. It is my belief that while these areas are worth further study, they do not represent blockers to QUIC, as a naive deployment of QUIC will substantially outperform our current (lack of) congestion and poor flow control.
## End-to-end Congestion Control Study
Unfortunately, it appears that the Quux QUIC paper studied QUIC at the wrong position - between relays, and the QuTor implementation is unclear. This means that it may still be an open question as to if QUIC's congestion control algorithms will behave optimally in a Tor-like network under heavy load.
However, Google has conducted extensive internet-scale research into the congestion control algorithm, which very likely covers enough networks with Tor-like latency and drop characteristics. Because of this, I do not expect us to have to do a lot of tuning of their congestion control here, but it is worth investigating.
## Queue Management
Fairness among flows in drop-based congestion control is heavily influenced by how the queue is managed, and particularly by the decision of which packet to drop when the queue is full. Modern favored algorithms involve probabilistic early dropping from the middle of the queue. The most popular approaches are the adaptive variants of RED and Blue: https://en.wikipedia.org/wiki/Random_early_detection https://en.wikipedia.org/wiki/Blue_(queue_management_algorithm)
The simplest possible queue management design would be to set Tor's queues to 0, and allow the OS kernel or upstream router to apply its drop strategy. While this will be sufficient to ensure backpressure under congestion, it is not optimal for fairness.
To enforce fairness, Tor would want to take information about QUIC connection IDs into account when dropping packets, similar to what is done with circuit ids by EWMA currently. This information will be unavailable to the kernel and upstream routers, due to the DTLS link encryption that will conceal the multiplexing of QUIC connections on connections to relays. Moreover, ensuring that sufficient queuing happens inside the Tor daemon (as opposed to the kernel) will require an aggressive socket-reading strategy. Tor's single-threaded nature will make this difficult -- to ensure that queuing happens inside the Tor daemon, packets must be continuously read from the kernel with minimal processing delay or delay from other events. Even in this case, optimality will still depend upon the bottleneck being Tor, and not in the kernel or in an upstream internet router. Thus, optimal queue management in Tor for QUIC remains an open research and engineering problem.
(However, this is not really a deployment blocker. The deployment of QUIC on the internet has similar shortcomings -- the vast majority of internet routers also do not take QUIC connection ID into account when making UDP drop decisions, and yet the protocol still outperforms TCP in link utilization and drop recovery).
## Anonymity
Since a Tor with bounded-length queues will have more predictable latency characteristics, a faster Tor based on QUIC may be more vulnerable to inferences about location that can be made by connection latency. Particularly, middle nodes may be able to tell how far the client is from the guard by measuring RTT. One solution to this may be adding delay in cases where RTT could be inferred. Another solution is to have two middle nodes for four hop paths, so that it is no longer clear which middle node you are in a QUIC circuit.
There may be other attacks that come out of using end-to-end congestion control, but I am personally having a hard time coming up with cases where such side channels and resource starvation conditions could be any worse than those that already exist, or are unique to the OOM and DoS conditions currently possible due to Tor's lack of congestion control.