commit 06ed620899c8e460369e21e15c5bdf9dc5a03e89 Author: Karsten Loesing karsten.loesing@gmx.net Date: Mon Sep 16 10:11:24 2019 +0200
Add Nick's and Mike's whitepaper draft. --- .../side-channel-analysis/side-channel-analysis.md | 505 +++++++++++++++++++++ 1 file changed, 505 insertions(+)
diff --git a/2018/side-channel-analysis/side-channel-analysis.md b/2018/side-channel-analysis/side-channel-analysis.md new file mode 100644 index 0000000..f9454de --- /dev/null +++ b/2018/side-channel-analysis/side-channel-analysis.md @@ -0,0 +1,505 @@ +# Towards Side Channel Analysis of Datagram Tor vs Current Tor + + Version 0.6, 27 Nov 2018 + + by Nick Mathewson and Mike Perry + +## Disclaimers + +This whitepaper assumes that you know how Tor works. + +There are probably some very good references here that we didn't +remember to cite. + +## Introduction + +Tor's current design requires that its data cells be transmitted +from one end of a circuit to the other using a reliable, in-order +delivery mechanism. To meet this requirement, Tor relays need to +buffer cells--spending resources, hurting performance, and risking +susceptibility to out-of-memory attacks. + +In order to improve Tor's performance and resilience, researchers +have made several proposals for ways to relax the requirement for +reliable in-order delivery. In general, these "datagram-based" +proposals would allow relays to drop or reorder cells as needed, +and move the responsibility for providing a reliable stream protocol +to the endpoints (the client and the exit relays). + +But by increasing flexibility for the relays, and by increasing the +complexity of the endpoints, these datagram proposals also create +some new attack vectors. Before we can deploy any of these designs, +we need to consider whether these attacks weaken Tor's security, or +whether they are irrelevant given other, stronger attacks against +Onion Routing. + +This whitepaper tries to list these attacks, and to provide a +framework for thinking about them as we move forward with our design +analysis. + +We hope that this whitepaper will help researchers and others in the Tor +community to understand these issues, so that we can work together to +find new ideas to analyze and mitigate the attacks described here, and +to help deploy a faster and more reliable network while still +maintaining our current (or better) security guarantees. We hope that +our description of the problem space will inspire, not discourage, +future experiments in this area, and help with a holistic understanding +of the risks, rewards, and future areas of work. + +### A toy system + +We will be analyzing a system that differs from Tor in the following +ways. + + * The link between a client and its guard, and between each pair + of relays uses DTLS over UDP: packets can be dropped or + re-ordered by an attacker on the link, but not modified, read, + or forged. Each DTLS packet contains an integer number of cells. + + * Each circuit between a client and an exit traverse several + relays, as before. The cells on a circuit are no longer + guaranteed to arrive reliably, but can be dropped or re-ordered + on the wire, or by a relay. + + * To provide reliable service end-to-end, the client and the + exit each use a TCP-like protocol to track which application + bytes have been sent and received. Received data is + acknowledged; dropped data is retransmitted. + + * The cryptography to be used for circuit encryption is not + specified here. + + * A reliable signaling mechanism between relays (to create, + destroy, and maintain circuits) is not specified here. + +(It is likely that many readers will be able to design a system that +resists the attacks below better than the design above. But please +remember as you do, that a design which improves a system in one way +may constrain it in others, or may offer insufficient benefits to be +clearly superior to Tor as it is today. Before we can deploy, we +will need not just defenses, but also a systemic way to compare the +effect of these defenses, used together, to the Tor status quo.) + +## Some preexisting attacks to consider + +To put the datagram-based attacks into context, we'll start out by +listing some attacks against the current non-datagram Tor design +(and proposed defenses for those, where they exist). + +We assume, as usual, an adversary who controls some but not all +relays, and some but not all ISPs. + +A note on attack power: the accuracy of many of these attacks, +particularly the passive ones, depends on the type of traffic being +sent, the quantity of similar traffic elsewhere on the Tor network, +the quantity of concurrent activity by the same client, the +adversary's observation position and data retention resolution, the +quantity of padding, and the tendency of the network to preserve or +alter packet timing information in transit. + +In many cases, we don't have good metrics or evaluation methodology +to determine how much harder or easier one attack is than another. + +### End-to-end passive traffic correlation attacks. + +Here's the gold-standard base-line attack: an attacker who can watch +any two points on the same circuit is assumed to be able to realize, +without having observed very much traffic at all, that the two +points are indeed on the same circuit by correlating the timing and +volume of data sent at those two points. + +When one of these points is also linked to the client, and one is +linked to the client's activity, this attack deanonymizes the +client. + +Tor's current design focuses on minimizing this probability, and +also shifting its characteristics, through things like network +diversity and long-term entry points. The attack may also become +harder (and/or slower) when there is a lot of similar concurrent +traffic on the Tor +network, which means that adding users who use Tor for many things +is in itself a form of mitigation. + +Proposed defenses in this area include deliberate obfuscation of +message volume through padding, and of message timing through random +delays, as well as things like traffic splitting and more complex +traffic scheduling for loud flows. While we have completed some work on +link padding, and are progressing on a deployment for circuit +padding, it is not yet clear if we can use these defenses in an +affordable way against a correlation attack, and it is hard to +measure their effectiveness on a realistic Tor-sized network. + +### Data tagging side-channels by relays + +If two relays are on the same circuit, they can surreptitiously +communicate with one another transforming the data in the RELAY +cells, and un-transforming the data before passing it on. Since +Tor's current encryption protocol is malleable, this allows them to +send a large number of bits per cell. + +This attack can also be used when two relays do not know if they are +on the same circuit. One relay modifies a cell, and the other one +looks for such modifications. If the data is processed by an honest +relay, it will destroy the circuit, but the client may or may not +notice that the circuit has destroyed. (And the dishonest relay may +delay informing the client!) + +To defend against this, we plan to replace our encryption with a +non-malleable algorithm. See for example proposals 202, 261, and +295. + +### Destructive side-channels (internal) + +Even if we remove the malleability in Tor's encryption, a smaller +side-channel remains: A dishonest relay can destroy a circuit at any +time, either by corrupting the circuit or simply sending a DESTROY +cell along it. A third party can destroy a large number of circuits +at once by remotely attacking a client or relay -- either disabling +that relay, or making it close circuits because of the OOM +handler. (See the Sniper Attack paper.) + +If a circuit is corrupted (as would happen if a relay attempted data +tagging against one of the non-malleable cryptographic algorithms +mentioned above), other points on the circuit can tell which cell is +the first corrupted cell. If a circuit is destroyed at one point, +other points on the circuit can tell how many cells were sent before +the destruction. + +It is likely that based on data or traffic patterns, most parties on +a circuit will be able to distinguish a prematurely destroyed +circuit from one that was shut down normally. + +In each case, this attack can be used to send (log n) bits of +information per circuit, at the cost of destroying the circuit, +where n is the number of cells that might be sent over the circuit in +total. Some noise will exist, since we expect some circuits to be +prematurely closed on their own. We don't know how much noise. + +We also have various heuristics that can attempt to detect if this +happens too often; however at best they likely reduce the rate that +information that can be sent in this way rather than eliminate it. +We also lack methodology to measure the rate of information in this +case, to help determine if we can successfully reduce it further. + +### Destructive network probes (external) + +Though TLS is resilient against many forms of active attacks, it +can't resist an attacker who focuses against the underlying TCP +layer. Such an attacker can, by forging TCP resets, cause all the +entire TLS connection to be dropped, thereby closing all the circuits +on it. This kind of attack can be observed at other points on the +network in a way similar to the destructive side-channels noted above. + +This class of attack seems to be easier against Tor's current design +than it would be against (some) datagram-based designs, since +datagram-based designs are resilient to more kinds of traffic +interference. + +### Timing-based watermarking attacks + +Hostile relays can also introduce a side channels to a circuit by +introducing patterned delays into the cells. For example, a relay +could buffer a large number of cells, then transmit a "1" bit by +sending a cell in a given time period, or a "0" by not sending cells +in that time period. + +An attacker can also mount this attack without controlling relays: +if the attacker performs a DoS attack against a relay or its +traffic, it can observe changes in the traffic volume elsewhere on +the network. + +[See https://www.freehaven.net/anonbib/cache/ccs07-latency-leak.pdf and +http://cybercentre.cs.ttu.ee/wp/wp-content/uploads/2017/01/crw2017_final.pdf ] + +The bandwidth of this side-channel will be limited, since other +relays on the network will naturally buffer and delay traffic, +obscuring the pattern some. There are also limits to how long +packets can be delayed before the relay is no longer usable. + +[See: + - Rainbow: https://www.freehaven.net/anonbib/cache/ndss09-rainbow.pdf + - Swirl: https://www.freehaven.net/anonbib/cache/ndss11-swirl.pdf + - Backlit (detection): + https://www.freehaven.net/anonbib/cache/acsac11-backlit.pdf ] + +Proposals for resisting this type of watermarking attack are mostly +of the same type that would be needed for resisting end-to-end +correlation. An adversary that can perform active attacks to +introduce their own unique traffic patterns intuitively seems much +stronger than one that must passively use potentially common +patterns. We lack a unified framework to tell us how much stronger +this adversary is than the passive one, especially against various +defenses. + +### Traffic injection attacks + +Related to the active timing attack, in some positions (like exit +and RP) relays can inject cells that are ignored by the other +endpoint. These injected patterns will not impact the user's +experience, but will allow unique traffic patterns to be sent and +detected by the adversary at crucial times. + +[See +https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf] + +These injection attacks arise from former adherence to Postel's +Maxim. Tor has since departed from this maxim, and instead opted for +stricter forward compatibility through feature versioning, but +removing instances in the codebase where injected cells can be +permitted has proven challenging. + +## Attacks unique to datagram designs + +Here are some attacks that are enabled by (or at any rate behave +differently under) datagram-based designs. + +### Traffic-stream tagging (by relays and internet links) + +Because the new system permits a number of transformations on +traffic that were not previously allowed, we need to look at how +those transformations can be used to attack users. + +As a trivial example, any router can relay an arbitrary subset of +the cells that it receives on a circuit, in an arbitrary order, due +to the exact properties the reliable transport aims to provide. The +pattern induced in this way will be detectable by the exit relay +when it attempts to reconstruct the stream. Because we explicitly +allow this kind of transformation, the circuit will not be killed +after a single dropped cell, but rather will continue working +silently. + +Moreover, any ISP can mount the same attack by dropping and/or +re-ordering DTLS calls. + +A remote attacker may also be able to mount this attack by flooding +any router between a client and its guard, thereby causing some of +the DTLS messages to get dropped. + +If we are using TCP between client and exit, the acknowledgments +sent by each endpoint will provide confirmation about which data it +received and which it did not. If instead of TCP, we use some other +protocol where the end-points communicate even more information +about which packets they did and did not receive, this can provide +an even higher-bandwidth side-channel. + +The bandwidth of this side-channel is fairly high, since it allows +the attacker to send over a bit per cell. But it will be somewhat +noisy, since some cells will dropped and reordered naturally. + +Padding, traffic splitting, and concurrent activity will increase +the noise of this attack; we lack metrics to tell us how much, and +we have no framework as of yet to measure the throughput of the +resulting side channel in these conditions. + +### Traffic Fingerprinting of TCP-like systems + +Today, because Tor terminates TCP at the guard node, there is +limited ability for the exit node to fingerprint client TCP +behavior (aside from perhaps measuring some effects on traffic +volume, but those are not likely preserved across the Tor network). + +However, when using a TCP-like system for end-to-end congestion +control, flow control, and reliability, the exit relay will be able +to make inferences about client implementation and conditions based +on its behavior. + +Different implementations of TCP-like systems behave differently. +Either party on a stream can observe the packets as they arrive to +notice cells from an unusual implementation. They can probe the +other side of the stream, nmap-style, to see how it responds to +various inputs. + +If two TCP-like implementations differ in their retransmit or timeout +behavior, an attacker can use this to distinguish them by carefully +chosen patterns of dropped traffic. Such an attacker does not even +need to be a relay, if it can cause DTLS packets between relays to +be dropped or reordered. + +This class of attacks is solvable, especially if the exact same +TCP-like implementation is used by all clients, but it also requires +careful consideration and additional constraints to be placed on the +TCP stack(s) in use that are not usually considered by TCP +implementations -- particularly to ensure that they do not depend on +OS-specific features or try to learn things about their environment +over time, across different connections. + + +### Retransmit-based watermarking + +Even if all TCP-like implementations are identical, they will +retransmit with different timing and volume based on which cells +have been acked or not acked. These differences may be observable +from many points on the circuit, or from outside the network. Such +retransmissions can be induced from outside the network, by hostile +relays, or even by a hostile endpoint that pretends not to have +received some of the packets. + +We again lack metrics to indicate that it is substantially worse +(or not worse) than other similar attacks. Intuitively, the key +difference in degree would come from how much easier it is to +perform this attack than the delay based watermarking attacks on +traditional Tor above. + +### Congestion and flow control interference + +To the extent that the TCP-like stack uses information learned from +one stream to alter its behavior on another stream, an attacker can +exploit this interference between streams make all of the streams +from a given party more linkable. + +All implementations will have some amount of interference, to the +extent that their bandwidth is limited. But some may have more than +necessary. + + +### Non-malleable encryption designs only currently exist for in-order +transports (or the return of data tagging attacks) + +Our proposed defenses against data tagging require us to move to +non-malleable encryption, with each cell's encryption tweaked by a +function of all previous cells, so that if even a single cell is +modified, not only is that cell corrupted, but no subsequent cell +can be decrypted. + +It seems nontrivial to achieve this property with datagram based +designs, since we require that cells on a circuit can be decrypted +even when previous cells have not arrived. We can achieve +data-based non-malleability by using a per-hop MAC for each cell -- +but we would no longer be able to get the property that a since +altered cell would make the whole circuit unrecoverable. This would +enable a one-bit-per-cell side-channel, similar but possibly more +powerful than the packet dropping side-channel above. (Because the +congestion window is essentially a bit vector of received cells, +the adversary in this scenario gets to corrupt cells in carefully +chosen ways instead of merely dropping them.) + +Perhaps other cryptographic schemes could be found to resist +data-tagging in a datagram-based environment or limit its impact, +but we'll need to figure out what the requirements and models are. + +As a proof-by-example of a mitigating system: Proposal 253 describes +a way to send a rolling MAC out of band, to ensure integrity of +packets between those cells. But can we do better? Can middle nodes +enforce integrity in some other way? + +### The risks of success: lower latency strengthens timing attacks? + +There are two factors that make timing-correlation and +timing-watermark attacks more difficult in practice: similarity +between different users' traffic, and distortion in timing patterns +caused by variance in cell latency on the network. To whatever +extent we successfully reduce this distortion by lowering latency, +it seems that we'll make these attacks more powerful. + +In particular, geolocation attacks based on observed circuit setup +times may get worse [See again +https://www.freehaven.net/anonbib/cache/ccs07-latency-leak.pdf]. + +We're already making improvements to Tor that may make these attacks +worse -- Tor latency has dropped and will continue to drop due to +improvements like KIST, more relays, and better load balancing. +Further incremental improvements like explicit congestion control on +the existing Tor network will reduce latency even further. + +It may be that a more performant Tor becomes less safe than a +slower, less usable Tor. On the other hand, a more usable Tor will +likely be used by more people, which we know makes many forms of +traffic analysis harder (slower?) in general. However, we have no way to +measure this tradeoff on many different attack types. + +Delay and latency can also be added back in, and this has been a common +defense against both active adversaries and timing attacks in the +anonymity literature, but such delays have user-facing consequences, +unless they are carefully restricted to the cases where the +adversary can directly measure RTT and can be amortized away by +things like pre-emptive circuit building. In this and other cases, +it is also not clear to what degree adding delay is more useful than +adding more padding. + + +## Towards comparing attacks + +A high-bandwidth attack is worse than a low-bandwidth attack. One +bit is enough to send "is this the targeted user?", but 32 bits is +enough to send a whole IP address. + +The impact of these attacks become worse if they can be repeated +over time. + +An attack that can be performed by an ISP relaying traffic is worse +than one that can be performed by a relay. An attack that can be +performed remotely against either of these is worse still. + +We need some kind of methodology to help us compare the new side +channels that datagram transports may enable to the existing side +channels in Tor, particularly delay-based and congestion-based side +channels. Ideally, these metrics or evaluation methodology would +also allow us to compare these side channels under various forms of +defense, such as padding. + +At the very least, we need some way to compare the side channels in +datagram transports to those that already exist. + +We also likely need a common reference research prototype and/or +platform to experiment with and study, so that attacks and defenses +are reproducibly comparable. Reproducibility in attack and defense +literature is often not reliable, due to differing implementations, +in addition to differing methodology and evaluation frameworks. + + +## Open Questions + +Why permit reordering? There are schemes (like order-preserving +encryption) that we could deploy on middle nodes to prevent +reordering, without allowing earlier nodes to differentiate +padding from non-padding. Do we derive any benefit by allowing a +relay to send cells on a single circuit in a different order than +the order in which it receives those cells on that circuit? This +may be an answered question in congestion control research, but we +lack the domain expertise to know what this tradeoff is. + +Related: what cryptography to use? Our current stateful encryption +schemes benefit from having access to "all previous cells" when +encrypting or decrypting each following cell. If we allow a cell to +be {de,en}crypted before previous cells are received, we'll need a +new model for onion-routing cryptography -- possibly one with +significantly bigger headers. + +## Future work + +We hope to investigate these issues with researchers and others in the +Tor community as we work towards solutions to help scale and strengthen +the Tor network. Understanding the risks and rewards that datagram-based +transports introduce to Tor is important to help us select designs that +both help improve performance but also guarantee safety for Tor +users. We hope that by cataloging these risks, future conversations +about improved network designs can bring answers and broader +improvements. We look forward to working with others interested in +helping solve these problems to design a better Tor. + +## Acknowledgments + +Our thanks to Chelsea Komlo for many helpful suggestions and +comments on earlier drafts of this whitepaper, and for writing the +request for future work. + +## Further reading + +Steven Murdoch, "Comparison of Tor Datagram Designs", 2011. +https://murdoch.is/papers/tor11datagramcomparison.pdf + +Mashael AlSabah and Ian Goldberg. "PCTCP: per-circuit +TCP-over-IPsec transport for anonymous communication overlay +networks", 2013. +http://cacr.uwaterloo.ca/techreports/2013/cacr2013-09.pdf + +Michael F. Nowlan, David Wolinsky, and Bryan Ford. +"Reducing Latency in Tor Circuits with Unordered Delivery", +2013. +https://www.usenix.org/system/files/conference/foci13/foci13-nowlan.pdf + +Rob Jansen, Florian Tschorsch, Aaron Johnson, and Björn Scheuermann +The Sniper attack: Anonymously Deanonymizing and Disabling the Tor +Network", 2013 +https://www.nrl.navy.mil/itd/chacs/sites/edit-www.nrl.navy.mil.itd.chacs/fil... +