Rob Jansen:
Thanks for the detailed write-up Mike! Theoretically, moving to QUIC seems great; it seems to solve a lot of problems and has enough advantages that we could just run with it.
I'll reiterate some of my primary concerns that I gave in Rome:
- I think it would be a mistake to push forward with such a
significant change to Tor's transport layer without significant testing and experimentation. We have tools that allow us to do full-network-sized experiments (Shadow), and we have interested researchers that want to help (including me).
I am not trying to argue against doing the research. My goal was to make enough of a case for QUIC that we can begin work on an implementation and tune it as we study it, or at least identify the minimum set of experiments that are needed before we could commit to such a path.
I expect us to have to tune things like queue length, queue management, slow start, reordering parameters, drop recovery strategies, and the backoff rates when drops happen. QUIC is also extensible enough such that things like explicit congestion notification and link capacity estimates can be used to try to avoid drops (though we would need to do this at the onion layer rather than the QUIC layer, because intermediate relays will not be able to add in ECN information in-band in our use of QUIC, due to onion crypto): https://tools.ietf.org/html/draft-johansson-quic-ecn-00
Because of this flexibility, I would be very surprised to discover that QUIC proves impossible to tune in order to outperform our current lack of congestion control.
- However, I am much less optimistic than you that it will just work
and instantly improve performance. You mention that Google has done lots of tests, but my guess is they test in environments that look like the Internet - i.e., fast core and slow edges. Do we know how it would perform when the path contains additional 6 edges sandwiching 3 potentially low bandwidth Tor relays? Tor is a significantly different environment than the Internet; for example, an end-to-end congestion signal in Tor will take orders of magnitude longer to reach the client than in traditional networks.
In drop-based congestion control, the duration of how long the drop signal takes to reach the client is not a function of where the drop happens. It is a function of the total RTT of the path. A drop early on the path takes just as long to discover as one burried in the middle.
As a result, higher RTT latency does impact drop-based schemes quite heavily (and the higher the drop rates, the worse this gets), but Tor's latency is only orders of magnitude greater than the internet because of queuing. If our queues can be bounded, then the latency multiplier should be proportional to the number of Tor hops (and the average physical distance of these paths).
- Because of the above, I'm not sure that an end-to-end design is the
right way to go. As I mentioned, we have simulators and researchers, so we should be able to learn more and make a more informed decision before committing to a design that will be difficult to change later.
I suppose that before we undertake or commit to a full implementation, a couple of basic experiments could inform us as to if Tor's latency and drop characteristics might severely impact vanilla QUIC performance.
1. What drop rates do fully-utilized QUIC networks tend to see? Are QUIC's backoff and recovery properties sufficient such that packet loss will remain reasonable under heavy use? Is this drop rate a function of the number of concurrent QUIC connections or other network properties? (I bet this information is known by groups studying QUIC and similar congestion control schemes, but I am not finding it with casual searching. TCP folk lore says "drop rate increases as concurrent connections increases" but I can't find concrete relationships.)
2. Given the above information about the level of drop rates that fully-utilized QUIC networks see under what circumstances, we can then conduct an experiment to inform us of what fairness and goodput look like under various link latencies with these drop rates.
#2 will inform us about whether QUIC is acceptable as-is, or if we would need to explore ECN or other non-drop based congestion signals.
We will need to be careful while conducting these experiments, though. I found a few research papers on QUIC, but nearly all of them state limitations wrt varying aspects of the protocol being disabled or enabled depending on Chromium/QUIC version (presumably due to whatever experiments Google was conducting at the time).
Additionally, it looks like many of the QUIC implementations do not implement all (or any) of the drop detection and recovery strategies mentioned in the spec, and even the Google implementation goes back and forth on things like FEC. So we will need to be careful to test what we intend to use.
- We should be sure to pay close attention to how this will affect
emerging networks and applications, e.g., mobile devices and onion services.
- The DoS attacks will change form, but I don't think they will
disappear. I think it would be wise to understand how DoS might change, which is much easier once we have a design to analyze. Your summary helps with that.
I agree that DoS will change form. But, the key draw for me is that instead of having DoS attacks that kill the circuit or even bring down the relay due to OOM (which provides a strong, clear signal to the adversary and enables Sniper attacks), DoS attacks will become congestion attacks that slow down service at specific bottlenecks, which I believe that conflux can further mitigate by dynamically avoiding them.
I think it would be worth including R&D effort to investigate these issues in any proposal that gets written.
Cheers, Rob
On Mar 23, 2018, at 7:18 PM, Mike Perry mikeperry@torproject.org wrote:
In Rome, I held a session about network protocol upgrades. My intent was to cover the switch to two guards, conflux, datagram transports, and QUIC. We ended up touching only briefly on everything but QUIC, but we went into enough depth on QUIC itself that it was a worthwhile and very productive session.
Our notes are here: https://trac.torproject.org/projects/tor/wiki/org/meetings/2018Rome/Notes/Fu...
I wanted to give a bit of background and some historical perspective about datagram transports for Tor, as well as explain those notes in long form, to get everybody on the same page about this idea. With the forthcoming IETF standard ratification of QUIC along with several solid reference implementations (canonical list: https://github.com/quicwg/base-drafts/wiki/Implementations), I believe we are close to the point where we can finally put together a plan (and a funding proposal) to undertake this work.
Consider this mail a pre-proposal to temperature check and solicit early feedback about a Tor-over-QUIC deployment, before we invest the effort to deep dive into the framing, protocol, and implementation details that will be necessary for a full proposal.
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev