[tor-commits] [tech-reports/master] Add Nick's and Mike's whitepaper draft.

Mon Sep 23 07:54:48 UTC 2019

commit 06ed620899c8e460369e21e15c5bdf9dc5a03e89
Author: Karsten Loesing <karsten.loesing at gmx.net>
Date:   Mon Sep 16 10:11:24 2019 +0200

    Add Nick's and Mike's whitepaper draft.
---
 .../side-channel-analysis/side-channel-analysis.md | 505 +++++++++++++++++++++
 1 file changed, 505 insertions(+)

diff --git a/2018/side-channel-analysis/side-channel-analysis.md b/2018/side-channel-analysis/side-channel-analysis.md
new file mode 100644
index 0000000..f9454de
--- /dev/null
+++ b/2018/side-channel-analysis/side-channel-analysis.md
@@ -0,0 +1,505 @@
+# Towards Side Channel Analysis of Datagram Tor vs Current Tor
+
+                     Version 0.6, 27 Nov 2018
+
+                 by Nick Mathewson and Mike Perry
+
+## Disclaimers
+
+This whitepaper assumes that you know how Tor works.
+
+There are probably some very good references here that we didn't
+remember to cite.
+
+## Introduction
+
+Tor's current design requires that its data cells be transmitted
+from one end of a circuit to the other using a reliable, in-order
+delivery mechanism.  To meet this requirement, Tor relays need to
+buffer cells--spending resources, hurting performance, and risking
+susceptibility to out-of-memory attacks.
+
+In order to improve Tor's performance and resilience, researchers
+have made several proposals for ways to relax the requirement for
+reliable in-order delivery.  In general, these "datagram-based"
+proposals would allow relays to drop or reorder cells as needed,
+and move the responsibility for providing a reliable stream protocol
+to the endpoints (the client and the exit relays).
+
+But by increasing flexibility for the relays, and by increasing the
+complexity of the endpoints, these datagram proposals also create
+some new attack vectors.  Before we can deploy any of these designs,
+we need to consider whether these attacks weaken Tor's security, or
+whether they are irrelevant given other, stronger attacks against
+Onion Routing.
+
+This whitepaper tries to list these attacks, and to provide a
+framework for thinking about them as we move forward with our design
+analysis.
+
+We hope that this whitepaper will help researchers and others in the Tor
+community to understand these issues, so that we can work together to
+find new ideas to analyze and mitigate the attacks described here, and
+to help deploy a faster and more reliable network while still
+maintaining our current (or better) security guarantees.  We hope that
+our description of the problem space will inspire, not discourage,
+future experiments in this area, and help with a holistic understanding
+of the risks, rewards, and future areas of work.
+
+### A toy system
+
+We will be analyzing a system that differs from Tor in the following
+ways.
+
+  * The link between a client and its guard, and between each pair
+    of relays uses DTLS over UDP: packets can be dropped or
+    re-ordered by an attacker on the link, but not modified, read,
+    or forged. Each DTLS packet contains an integer number of cells.
+
+  * Each circuit between a client and an exit traverse several
+    relays, as before. The cells on a circuit are no longer
+    guaranteed to arrive reliably, but can be dropped or re-ordered
+    on the wire, or by a relay.
+
+  * To provide reliable service end-to-end, the client and the
+    exit each use a TCP-like protocol to track which application
+    bytes have been sent and received.  Received data is
+    acknowledged;  dropped data is retransmitted.
+
+  * The cryptography to be used for circuit encryption is not
+    specified here.
+
+  * A reliable signaling mechanism between relays (to create,
+    destroy, and maintain circuits) is not specified here.
+
+(It is likely that many readers will be able to design a system that
+resists the attacks below better than the design above.  But please
+remember as you do, that a design which improves a system in one way
+may constrain it in others, or may offer insufficient benefits to be
+clearly superior to Tor as it is today.  Before we can deploy, we
+will need not just defenses, but also a systemic way to compare the
+effect of these defenses, used together, to the Tor status quo.)
+
+## Some preexisting attacks to consider
+
+To put the datagram-based attacks into context, we'll start out by
+listing some attacks against the current non-datagram Tor design
+(and proposed defenses for those, where they exist).
+
+We assume, as usual, an adversary who controls some but not all
+relays, and some but not all ISPs.
+
+A note on attack power: the accuracy of many of these attacks,
+particularly the passive ones, depends on the type of traffic being
+sent, the quantity of similar traffic elsewhere on the Tor network,
+the quantity of concurrent activity by the same client, the
+adversary's observation position and data retention resolution, the
+quantity of padding, and the tendency of the network to preserve or
+alter packet timing information in transit.
+
+In many cases, we don't have good metrics or evaluation methodology
+to determine how much harder or easier one attack is than another.
+
+### End-to-end passive traffic correlation attacks.
+
+Here's the gold-standard base-line attack: an attacker who can watch
+any two points on the same circuit is assumed to be able to realize,
+without having observed very much traffic at all, that the two
+points are indeed on the same circuit by correlating the timing and
+volume of data sent at those two points.
+
+When one of these points is also linked to the client, and one is
+linked to the client's activity, this attack deanonymizes the
+client.
+
+Tor's current design focuses on minimizing this probability, and
+also shifting its characteristics, through things like network
+diversity and long-term entry points. The attack may also become
+harder (and/or slower) when there is a lot of similar concurrent
+traffic on the Tor
+network, which means that adding users who use Tor for many things
+is in itself a form of mitigation.
+
+Proposed defenses in this area include deliberate obfuscation of
+message volume through padding, and of message timing through random
+delays, as well as things like traffic splitting and more complex
+traffic scheduling for loud flows. While we have completed some work on
+link padding, and are progressing on a deployment for circuit
+padding, it is not yet clear if we can use these defenses in an
+affordable way against a correlation attack, and it is hard to
+measure their effectiveness on a realistic Tor-sized network.
+
+### Data tagging side-channels by relays
+
+If two relays are on the same circuit, they can surreptitiously
+communicate with one another transforming the data in the RELAY
+cells, and un-transforming the data before passing it on.  Since
+Tor's current encryption protocol is malleable, this allows them to
+send a large number of bits per cell.
+
+This attack can also be used when two relays do not know if they are
+on the same circuit.  One relay modifies a cell, and the other one
+looks for such modifications.  If the data is processed by an honest
+relay, it will destroy the circuit, but the client may or may not
+notice that the circuit has destroyed.  (And the dishonest relay may
+delay informing the client!)
+
+To defend against this, we plan to replace our encryption with a
+non-malleable algorithm.  See for example proposals 202, 261, and
+295.
+
+### Destructive side-channels (internal)
+
+Even if we remove the malleability in Tor's encryption, a smaller
+side-channel remains: A dishonest relay can destroy a circuit at any
+time, either by corrupting the circuit or simply sending a DESTROY
+cell along it.  A third party can destroy a large number of circuits
+at once by remotely attacking a client or relay -- either disabling
+that relay, or making it close circuits because of the OOM
+handler. (See the Sniper Attack paper.)
+
+If a circuit is corrupted (as would happen if a relay attempted data
+tagging against one of the non-malleable cryptographic algorithms
+mentioned above), other points on the circuit can tell which cell is
+the first corrupted cell.  If a circuit is destroyed at one point,
+other points on the circuit can tell how many cells were sent before
+the destruction.
+
+It is likely that based on data or traffic patterns, most parties on
+a circuit will be able to distinguish a prematurely destroyed
+circuit from one that was shut down normally.
+
+In each case, this attack can be used to send (log n) bits of
+information per circuit, at the cost of destroying the circuit,
+where n is the number of cells that might be sent over the circuit in
+total.  Some noise will exist, since we expect some circuits to be
+prematurely closed on their own.  We don't know how much noise.
+
+We also have various heuristics that can attempt to detect if this
+happens too often; however at best they likely reduce the rate that
+information that can be sent in this way rather than eliminate it.
+We also lack methodology to measure the rate of information in this
+case, to help determine if we can successfully reduce it further.
+
+### Destructive network probes (external)
+
+Though TLS is resilient against many forms of active attacks, it
+can't resist an attacker who focuses against the underlying TCP
+layer.  Such an attacker can, by forging TCP resets, cause all the
+entire TLS connection to be dropped, thereby closing all the circuits
+on it.  This kind of attack can be observed at other points on the
+network in a way similar to the destructive side-channels noted above.
+
+This class of attack seems to be easier against Tor's current design
+than it would be against (some) datagram-based designs, since
+datagram-based designs are resilient to more kinds of traffic
+interference.
+
+### Timing-based watermarking attacks
+
+Hostile relays can also introduce a side channels to a circuit by
+introducing patterned delays into the cells.  For example, a relay
+could buffer a large number of cells, then transmit a "1" bit by
+sending a cell in a given time period, or a "0" by not sending cells
+in that time period.
+
+An attacker can also mount this attack without controlling relays:
+if the attacker performs a DoS attack against a relay or its
+traffic, it can observe changes in the traffic volume elsewhere on
+the network.
+
+[See https://www.freehaven.net/anonbib/cache/ccs07-latency-leak.pdf and
+http://cybercentre.cs.ttu.ee/wp/wp-content/uploads/2017/01/crw2017_final.pdf ]
+
+The bandwidth of this side-channel will be limited, since other
+relays on the network will naturally buffer and delay traffic,
+obscuring the pattern some.  There are also limits to how long
+packets can be delayed before the relay is no longer usable.
+
+[See:
+   - Rainbow: https://www.freehaven.net/anonbib/cache/ndss09-rainbow.pdf
+   - Swirl: https://www.freehaven.net/anonbib/cache/ndss11-swirl.pdf
+   - Backlit (detection):
+     https://www.freehaven.net/anonbib/cache/acsac11-backlit.pdf ]
+
+Proposals for resisting this type of watermarking attack are mostly
+of the same type that would be needed for resisting end-to-end
+correlation. An adversary that can perform active attacks to
+introduce their own unique traffic patterns intuitively seems much
+stronger than one that must passively use potentially common
+patterns. We lack a unified framework to tell us how much stronger
+this adversary is than the passive one, especially against various
+defenses.
+
+### Traffic injection attacks
+
+Related to the active timing attack, in some positions (like exit
+and RP) relays can inject cells that are ignored by the other
+endpoint. These injected patterns will not impact the user's
+experience, but will allow unique traffic patterns to be sent and
+detected by the adversary at crucial times.
+
+[See
+https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf]
+
+These injection attacks arise from former adherence to Postel's
+Maxim. Tor has since departed from this maxim, and instead opted for
+stricter forward compatibility through feature versioning, but
+removing instances in the codebase where injected cells can be
+permitted has proven challenging.
+
+## Attacks unique to datagram designs
+
+Here are some attacks that are enabled by (or at any rate behave
+differently under) datagram-based designs.
+
+### Traffic-stream tagging (by relays and internet links)
+
+Because the new system permits a number of transformations on
+traffic that were not previously allowed, we need to look at how
+those transformations can be used to attack users.
+
+As a trivial example, any router can relay an arbitrary subset of
+the cells that it receives on a circuit, in an arbitrary order, due
+to the exact properties the reliable transport aims to provide. The
+pattern induced in this way will be detectable by the exit relay
+when it attempts to reconstruct the stream.  Because we explicitly
+allow this kind of transformation, the circuit will not be killed
+after a single dropped cell, but rather will continue working
+silently.
+
+Moreover, any ISP can mount the same attack by dropping and/or
+re-ordering DTLS calls.
+
+A remote attacker may also be able to mount this attack by flooding
+any router between a client and its guard, thereby causing some of
+the DTLS messages to get dropped.
+
+If we are using TCP between client and exit, the acknowledgments
+sent by each endpoint will provide confirmation about which data it
+received and which it did not. If instead of TCP, we use some other
+protocol where the end-points communicate even more information
+about which packets they did and did not receive, this can provide
+an even higher-bandwidth side-channel.
+
+The bandwidth of this side-channel is fairly high, since it allows
+the attacker to send over a bit per cell. But it will be somewhat
+noisy, since some cells will dropped and reordered naturally.
+
+Padding, traffic splitting, and concurrent activity will increase
+the noise of this attack; we lack metrics to tell us how much, and
+we have no framework as of yet to measure the throughput of the
+resulting side channel in these conditions.
+
+### Traffic Fingerprinting of TCP-like systems
+
+Today, because Tor terminates TCP at the guard node, there is
+limited ability for the exit node to fingerprint client TCP
+behavior (aside from perhaps measuring some effects on traffic
+volume, but those are not likely preserved across the Tor network).
+
+However, when using a TCP-like system for end-to-end congestion
+control, flow control, and reliability, the exit relay will be able
+to make inferences about client implementation and conditions based
+on its behavior.
+
+Different implementations of TCP-like systems behave differently.
+Either party on a stream can observe the packets as they arrive to
+notice cells from an unusual implementation.  They can probe the
+other side of the stream, nmap-style, to see how it responds to
+various inputs.
+
+If two TCP-like implementations differ in their retransmit or timeout
+behavior, an attacker can use this to distinguish them by carefully
+chosen patterns of dropped traffic.  Such an attacker does not even
+need to be a relay, if it can cause DTLS packets between relays to
+be dropped or reordered.
+
+This class of attacks is solvable, especially if the exact same
+TCP-like implementation is used by all clients, but it also requires
+careful consideration and additional constraints to be placed on the
+TCP stack(s) in use that are not usually considered by TCP
+implementations -- particularly to ensure that they do not depend on
+OS-specific features or try to learn things about their environment
+over time, across different connections.
+
+
+### Retransmit-based watermarking
+
+Even if all TCP-like implementations are identical, they will
+retransmit with different timing and volume based on which cells
+have been acked or not acked.  These differences may be observable
+from many points on the circuit, or from outside the network.  Such
+retransmissions can be induced from outside the network, by hostile
+relays, or even by a hostile endpoint that pretends not to have
+received some of the packets.
+
+We again lack metrics to indicate that it is substantially worse
+(or not worse) than other similar attacks. Intuitively, the key
+difference in degree would come from how much easier it is to
+perform this attack than the delay based watermarking attacks on
+traditional Tor above.
+
+### Congestion and flow control interference
+
+To the extent that the TCP-like stack uses information learned from
+one stream to alter its behavior on another stream, an attacker can
+exploit this interference between streams make all of the streams
+from a given party more linkable.
+
+All implementations will have some amount of interference, to the
+extent that their bandwidth is limited.  But some may have more than
+necessary.
+
+
+### Non-malleable encryption designs only currently exist for in-order
+transports (or the return of data tagging attacks)
+
+Our proposed defenses against data tagging require us to move to
+non-malleable encryption, with each cell's encryption tweaked by a
+function of all previous cells, so that if even a single cell is
+modified, not only is that cell corrupted, but no subsequent cell
+can be decrypted.
+
+It seems nontrivial to achieve this property with datagram based
+designs, since we require that cells on a circuit can be decrypted
+even when previous cells have not arrived.  We can achieve
+data-based non-malleability by using a per-hop MAC for each cell --
+but we would no longer be able to get the property that a since
+altered cell would make the whole circuit unrecoverable.  This would
+enable a one-bit-per-cell side-channel, similar but possibly more
+powerful than the packet dropping side-channel above. (Because the
+congestion window is essentially a bit vector of received cells,
+the adversary in this scenario gets to corrupt cells in carefully
+chosen ways instead of merely dropping them.)
+
+Perhaps other cryptographic schemes could be found to resist
+data-tagging in a datagram-based environment or limit its impact,
+but we'll need to figure out what the requirements and models are.
+
+As a proof-by-example of a mitigating system: Proposal 253 describes
+a way to send a rolling MAC out of band, to ensure integrity of
+packets between those cells. But can we do better? Can middle nodes
+enforce integrity in some other way?
+
+### The risks of success: lower latency strengthens timing attacks?
+
+There are two factors that make timing-correlation and
+timing-watermark attacks more difficult in practice: similarity
+between different users' traffic, and distortion in timing patterns
+caused by variance in cell latency on the network.  To whatever
+extent we successfully reduce this distortion by lowering latency,
+it seems that we'll make these attacks more powerful.
+
+In particular, geolocation attacks based on observed circuit setup
+times may get worse [See again
+https://www.freehaven.net/anonbib/cache/ccs07-latency-leak.pdf].
+
+We're already making improvements to Tor that may make these attacks
+worse -- Tor latency has dropped and will continue to drop due to
+improvements like KIST, more relays, and better load balancing.
+Further incremental improvements like explicit congestion control on
+the existing Tor network will reduce latency even further.
+
+It may be that a more performant Tor becomes less safe than a
+slower, less usable Tor. On the other hand, a more usable Tor will
+likely be used by more people, which we know makes many forms of
+traffic analysis harder (slower?) in general. However, we have no way to
+measure this tradeoff on many different attack types.
+
+Delay and latency can also be added back in, and this has been a common
+defense against both active adversaries and timing attacks in the
+anonymity literature, but such delays have user-facing consequences,
+unless they are carefully restricted to the cases where the
+adversary can directly measure RTT and can be amortized away by
+things like pre-emptive circuit building. In this and other cases,
+it is also not clear to what degree adding delay is more useful than
+adding more padding.
+
+
+## Towards comparing attacks
+
+A high-bandwidth attack is worse than a low-bandwidth attack.  One
+bit is enough to send "is this the targeted user?", but 32 bits is
+enough to send a whole IP address.
+
+The impact of these attacks become worse if they can be repeated
+over time.
+
+An attack that can be performed by an ISP relaying traffic is worse
+than one that can be performed by a relay. An attack that can be
+performed remotely against either of these is worse still.
+
+We need some kind of methodology to help us compare the new side
+channels that datagram transports may enable to the existing side
+channels in Tor, particularly delay-based and congestion-based side
+channels.  Ideally, these metrics or evaluation methodology would
+also allow us to compare these side channels under various forms of
+defense, such as padding.
+
+At the very least, we need some way to compare the side channels in
+datagram transports to those that already exist.
+
+We also likely need a common reference research prototype and/or
+platform to experiment with and study, so that attacks and defenses
+are reproducibly comparable. Reproducibility in attack and defense
+literature is often not reliable, due to differing implementations,
+in addition to differing methodology and evaluation frameworks.
+
+
+## Open Questions
+
+Why permit reordering? There are schemes (like order-preserving
+encryption) that we could deploy on middle nodes to prevent
+reordering, without allowing earlier nodes to differentiate
+padding from non-padding. Do we derive any benefit by allowing a
+relay to send cells on a single circuit in a different order than
+the order in which it receives those cells on that circuit? This
+may be an answered question in congestion control research, but we
+lack the domain expertise to know what this tradeoff is.
+
+Related: what cryptography to use?  Our current stateful encryption
+schemes benefit from having access to "all previous cells" when
+encrypting or decrypting each following cell.  If we allow a cell to
+be {de,en}crypted before previous cells are received, we'll need a
+new model for onion-routing cryptography -- possibly one with
+significantly bigger headers.
+
+## Future work
+
+We hope to investigate these issues with researchers and others in the
+Tor community as we work towards solutions to help scale and strengthen
+the Tor network. Understanding the risks and rewards that datagram-based
+transports introduce to Tor is important to help us select designs that
+both help improve performance but also guarantee safety for Tor
+users. We hope that by cataloging these risks, future conversations
+about improved network designs can bring answers and broader
+improvements. We look forward to working with others interested in
+helping solve these problems to design a better Tor.
+
+## Acknowledgments
+
+Our thanks to Chelsea Komlo for many helpful suggestions and
+comments on earlier drafts of this whitepaper, and for writing the
+request for future work.
+
+## Further reading
+
+Steven Murdoch, "Comparison of Tor Datagram Designs", 2011.
+https://murdoch.is/papers/tor11datagramcomparison.pdf
+
+Mashael AlSabah and Ian Goldberg. "PCTCP: per-circuit
+TCP-over-IPsec transport for anonymous communication overlay
+networks", 2013.
+http://cacr.uwaterloo.ca/techreports/2013/cacr2013-09.pdf
+
+Michael F. Nowlan, David Wolinsky, and Bryan Ford.
+"Reducing Latency in Tor Circuits with Unordered Delivery",
+2013.
+https://www.usenix.org/system/files/conference/foci13/foci13-nowlan.pdf
+
+Rob Jansen, Florian Tschorsch, Aaron Johnson, and Björn Scheuermann
+The Sniper attack: Anonymously Deanonymizing and Disabling the Tor
+Network", 2013
+https://www.nrl.navy.mil/itd/chacs/sites/edit-www.nrl.navy.mil.itd.chacs/files/pdfs/13-1231-3743.pdf
+