[tor-commits] [torspec/master] add 302-padding-machines-for-onion-clients.txt

16 May 2019

commit 9503a261020ad906bb07595d8fba25b4542bba8a
Author: Nick Mathewson <nickm@torproject.org>
Date:   Thu May 16 09:45:15 2019 -0400

    add 302-padding-machines-for-onion-clients.txt
---
 proposals/000-index.txt                            |   2 +
 .../302-padding-machines-for-onion-clients.txt     | 299 +++++++++++++++++++++
 2 files changed, 301 insertions(+)

diff --git a/proposals/000-index.txt b/proposals/000-index.txt
index a504c27..5982230 100644
--- a/proposals/000-index.txt
+++ b/proposals/000-index.txt
@@ -222,6 +222,7 @@ Proposals by number:
 299  Preferring IPv4 or IPv6 based on IP Version Failure Count [OPEN]
 300  Walking Onions: Scaling and Saving Bandwidth [DRAFT]
 301  Don't include package fingerprints in consensus documents [ACCEPTED]
+302  Hiding onion service clients using padding [ACCEPTED]
 
 
 Proposals by status:
@@ -262,6 +263,7 @@ Proposals by status:
    288  Privacy-Preserving Statistics with Privcount in Tor (Shamir version)
    292  Mesh-based vanguards
    301  Don't include package fingerprints in consensus documents
+   302  Hiding onion service clients using padding
  META:
    000  Index of Tor Proposals
    001  The Tor Proposal Process
diff --git a/proposals/302-padding-machines-for-onion-clients.txt b/proposals/302-padding-machines-for-onion-clients.txt
new file mode 100644
index 0000000..d7f583e
--- /dev/null
+++ b/proposals/302-padding-machines-for-onion-clients.txt
@@ -0,0 +1,299 @@
+Filename: 302-padding-machines-for-onion-clients.txt
+Title: Hiding onion service clients using padding
+Author: George Kadianakis, Mike Perry
+Created: Thursday 16 May 2019
+Status: Accepted
+Ticket: #28634
+
+0. Overview
+
+   Tor clients use "circuits" to do anonymous communications. There are various
+   types of circuits. Some of them are for navigating the normal Internet,
+   others are for fetching Tor directory information, others are for connecting
+   to onion services, while others are simply for measurements and testing.
+
+   It's currently possible for MITM type of adversaries (like tor-network-level
+   and local-area-network adversaries) to distinguish Tor circuit types from
+   each other using a wide array of metadata and distinguishers.
+
+   In this proposal, we study various techniques that can be used to
+   distinguish client-side onion service circuits and provide WTF-PAD circuit
+   padding machines (using prop#254) to hide them against certain adversaries.
+
+1. Motivation
+
+   We are writing this proposal for various reasons:
+
+   1) We believe that in an ideal setting MITM adversaries should not be able
+      to distinguish circuit types by inspecting traffic. Tor traffic should
+      look amorphous to an outside observer to maximize uncertainty and
+      anonymity properties.
+
+      Client-side onion service circuits are an easy target for this proposal,
+      because we believe we can improve their privacy with low bandwidth
+      overhead.
+
+   2) We want to start experimenting with the WTF-PAD subsystem of Tor, and
+      this use-case provides us with a good testbed.
+
+   3) We hope that by actually starting to use the WTF-PAD subsystem of Tor, we
+      will encourage more researchers to start experimenting with it.
+
+2. Scope of the proposal [SCOPE]
+
+   Given the above, this proposal sets forth to use the WTF-PAD system to hide
+   client-side onion service circuits against the classifiers of paper by Kwon
+   et al. above.
+
+   By client-side onion service circuits we refer to these two types of circuits:
+      - Client-side introduction circuits: Circuit from client to the introduction point
+      - Client-side rendezvous circuits: Circuit from client to the rendezvous point
+
+   Service-side onion service circuits are not in scope for this proposal, and
+   this is because hiding those would require more bandwidth and also more
+   advanced WTF-PAD features.
+
+   Furthermore, this proposal only aims to cloak the naive distinguishing
+   features mentioned in the [KNOWN_DISTINGUISHERS] section, and can by no
+   means guarantee that client-side onion service circuits are totally
+   indistinguishable by other means.
+
+   The machines specified in this proposal are meant to be lightweight and
+   created for a specific purpose. This means that they can be easily extended
+   with additional states to do more advanced hiding.
+
+3. Known distinguishers against onion service circuits [KNOWN_DISTINGUISHERS]
+
+   Over the past years it's been assumed that motivated adversaries can
+   distinguish onion-service traffic from normal Tor traffic given their
+   special characteristics.
+
+   As far as we know, there has been relatively little research-level work done
+   to this direction. The main article published in this area is the USENIX
+   paper "Circuit Fingerprinting Attacks: Passive Deanonymization of Tor Hidden
+   Services" by Kwon et al. [0]
+
+   The above paper deals with onion service circuits in sections 3.2 and 5.1.
+   It uses the following three "naive" circuit features to distinguish circuits:
+      1) Circuit construction sequence
+      2) Number of incoming and outgoing cells
+      3) Duration of Activity ("DoA")
+
+    All onion service circuits have particularly loud signatures to the above
+    characteristics, but WTF-PAD (prop#254) gives us tools to effectively
+    silence those signatures to the point where the paper's classifiers won't
+    work.
+
+4. Hiding circuit features using WTF-PAD
+
+   According to section [KNOWN_DISTINGUISHERS] there are three circuit features
+   we are attempting to hide. Here is how we plan to do this using the WTF-PAD
+   system:
+
+   1) Circuit construction sequence
+
+      The USENIX paper uses the directions of the first 10 cells sent in a
+      circuit to fingerprint them. Client-side onion service circuits have
+      unique circuit construction sequences and hence they can be fingeprinted
+      using just the first 10 cells.
+
+      We use WTF-PAD to destroy this feature of onion service circuits by
+      carefully sending padding cells (relay DROP cells) during circuit
+      construction and making them look exactly like most general tor circuits
+      up till the end of the circuit construction sequence.
+
+   2) Number of incoming and outgoing cells
+
+      The USENIX paper uses the amount of incoming and outgoing cells to
+      distinguish circuit types. For example, client-side introduction circuits
+      have the same amount of incoming and outgoing cells, whereas client-side
+      rendezvous circuits have more incoming than outgoing cells.
+
+      We use WTF-PAD to destroy this feature by changing the number of cells
+      sent in introduction circuits. We leave rendezvous circuits as is, since
+      the actual rendezvous traffic flow usually resembles well normal Tor
+      circuits.
+
+    3) Duration of Activity ("DoA")
+
+      The USENIX paper uses the period of time during which circuits send and
+      receive cells to distinguish circuit types. For example, client-side
+      introduction circuits are really short lived, wheras service-side
+      introduction circuits are very long lived. OTOH, rendezvous circuits have
+      the same median lifetime as general Tor circuits which is 10 minutes.
+
+      We use WTF-PAD to destroy this feature of client-side introduction
+      circuits by setting a special WTF-PAD option, which keeps the circuits
+      open for 10 minutes completely mimicking the DoA of general Tor circuits.
+
+4.1. A dive into general circuit construction sequences [CIRCCONSTRUCTION]
+
+   In this section we give an overview of how circuit construction looks like
+   to a network or guard-level adversary. We use this knowledge to make the
+   right padding machines that can make intro and rend circuits look like these
+   general circuits.
+
+   In particular, most general Tor circuits used to surf the web or download
+   directory information, start with the following 6-cell relay cell sequence (cells
+   surrounded in [brackets] are outgoing, the others are incoming):
+
+     [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED
+
+   When this is done, the client has established a 3-hop circuit and also
+   opened a stream to the other end. Usually after this comes a series of DATA
+   cell that either fetches pages, establishes an SSL connection or fetches
+   directory information:
+
+     [DATA] -> [DATA] -> DATA -> DATA
+
+   The above stream of 10 relay cells defines the grand majority of general
+   circuits that come out of Tor browser during our testing, and it's what we
+   are gonna use to make introduction and rednezvous circuits blend in.
+
+   Please note that in this section we only investigate relay cells and not
+   connection-level cells like CREATE/CREATED or AUTHENTICATE/etc. that are
+   used during the link-layer handshake. The rationale is that connection-level
+   cells depend on the type of guard used and are not an effective fingerprint
+   for a network/guard-level adversary.
+
+5. WTF-PAD machines
+
+   For the purposes of this proposal we will make use of four WTF-PAD machines
+   as follows:
+
+      - Client-side introduction circuit hiding machine (origin-side)
+      - Client-side introduction circuit hiding machine (relay-side)
+
+      - Client-side rendezvous circuit hiding machine (origin-side)
+      - Client-side rendezvous circuit hiding machine (relay-side)
+
+   In the following sections we will analyze these machines.
+
+5.1. Client-side introduction circuit hiding machines [INTRO_CIRC_HIDING]
+
+   These two machines are meant to hide client-side introduction circuits. The
+   origin-side machine sits on the client and sends padding towards the
+   introduction circuit, whereas the relay-side machine sits on the middle-hop
+   (second hop of the circuit) and sends padding towards the client. The
+   padding from the origin-side machine terminates at the middle-hop and does
+   not get forwarded to the actual introduction point.
+
+   Both of these machines only get activated for introduction circuits, and
+   only after an INTRODUCE1 cell has been sent out.
+
+   This means that before the machine gets activated our cell flow looks like this:
+
+    [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [INTRODUCE1]
+
+   Comparing the above with section [CIRCCONSTRUCTION], we see that the above
+   cell sequence matches the one from general circuits up to the first 7 cells.
+
+   However, in normal introduction circuits this is followed by an
+   INTRODUCE_ACK and then the circuit gets teared down, which does not match
+   the sequence from [CIRCCONSTRUCTION].
+
+   Hence when our machine is used, after sending an [INTRODUCE1] cell, we also
+   send a [PADDING_NEGOTIATE] cell, which gets answered by a PADDING_NEGOTIATED
+   cell and an INTRODUCE_ACKED cell. This makes us match the [CIRCCONSTRUCTION]
+   sequence up to the first 10 cells.
+
+   After that, we continue sending padding from the relay-side machine so as to
+   fake a directory download, or an SSL connection setup. We also want to
+   continue sending padding so that the connection stays up longer to destroy
+   the "Duration of Activity" fingerprint.
+
+   To calculate the padding overhead, we see that the origin-side machine just
+   sends a single [PADDING_NEGOATIATE] cell, wheras the origin-side machine
+   sends a PADDING_NEGOTIATED cell and between 7 to 10 DROP cells. This means
+   that the average overhead of this machine is 11 padding cells.
+
+   In terms of WTF-PAD terminology, these machines have three states (START,
+   OBF, END). They move from the START to OBF state when the first
+   non-padding cell is received on the circuit, and they stay in the OBF
+   state until all the padding gets depleted. The OBF state is controlled by
+   a histogram which specifies the parameters described in the paragraphs
+   above. After all the padding finishes, it moves to END state.
+
+   We also set a special WTF-PAD flag which keeps the circuit open even after
+   the introduction is performed. In particular, with this feature the circuit
+   will stay alive for the same durations as normal web circuits before they
+   expire (usually 10 minutes).
+
+5.2. Client-side rendezvous circuit hiding machines
+
+   The rendezvous circuit machines apply on client-side rendezvous circuits and
+   only after the rendezvous point has been established (REND_ESTABLISHED has
+   been received). Up to that point, the following cell sequence has been
+   observed on the circuit:
+
+    [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [ESTABLISH_REND] -> REND_ESTABLISHED
+
+   which matches the general circuit construction sequence [CIRCCONSTRUCTION]
+   up to the first 6 cells. However after that, normal rendezvous circuits
+   receive a RENDEZVOUS2 cell followed by a [BEGIN] and a CONNECTED, which does
+   not fit the circuit construction sequence we are trying to imitate.
+
+   Hence our machine gets activated right after REND_ESTABLISHED is received,
+   and continues by sending a [PADDING_NEGOTIATE] and a [DROP] cell, before
+   receiving a PADDING_NEGOTIATED and a DROP cell, effectively blending into
+   the general circuit construction sequence on the first 10 cells.
+
+   After that our machine gets deactivated, and we let the actual rendezvous
+   circuit shape the traffic flow. Since rendezvous circuits usually immitate
+   general circuits (their purpose is to surf the web), we can expect that they
+   will look alike.
+
+   In terms of overhead, this machine is quite light. Both sides send 2 padding
+   cells, for a total of 4 padding cells.
+
+6. Overhead analysis
+
+   Given the parameters above, intro circuit machines have an overhead of 11
+   padding cells, and rendezvous circuit machines have an overhead of 4
+   cpadding ells.  . This means that for every intro and rendezvous circuit
+   there will be an overhead of 15 padding cells in average, which is about
+   7.5kb.
+
+   In the PrivCount paper [1] we learn that the Tor network sees about 12
+   million successful descriptor fetches per day. We can use this figure to
+   assume that the Tor network also sees about 12 million intro and rendezvous
+   circuits per day. Given the 7.5kb overhead of each of these circuits, we get
+   that our padding machines infer an additional 94GB overhead per day on the
+   network, which is about 3.9GB per hour.
+
+   XXX Isn't this kinda intense????? Using the graphs from metrics we see that
+       the Tor network has total capacity of 300 Gbit/s which is about 135000GB per
+       hour, so 3.9GB per hour is not that much, but still...
+
+7. Discussion
+
+7.1. Alternative approaches
+
+   These machines try to hide onion service client-side circuits by obfuscating
+   their looks. This is a reasonable approach, but if the resulting circuits
+   look unlike any other Tor circuits, they would still be fingerprintable just
+   by that fact.
+
+   Another approach we could take is make normal client circuits look like
+   onion service circuits, or just make normal clients establish fake onion
+   service circuits periodically. The hope here is that the adversary won't be
+   able to distinguish fake onion service circuits from real ones. This
+   approach has not been taken yet, mainly because it requires additional
+   WTF-PAD features and poses greater overhead risks.
+
+7.2. Future work
+
+   As discussed in [SCOPE], this proposal only aims to hide some very specific
+   features of client-side onion service circuits. There is lots of work to be
+   done here to see what other features can be used to distinguish such
+   circuits, and also what other classifiers can be built using deep learning
+   and whatnot.
+
+---
+
+   [0]: https://www.usenix.org/node/190967
+        https://blog.torproject.org/technical-summary-usenix-fingerprinting-paper
+
+   [1]: "Understanding Tor Usage with Privacy-Preserving Measurement"
+        by Akshaya Mani, T Wilson-Brown, Rob Jansen, Aaron Johnson, and Micah Sherr
+        In Proceedings of the Internet Measurement Conference 2018 (IMC 2018).

    

[tor-commits] [torspec/master] add 302-padding-machines-for-onion-clients.txt

nickm＠torproject.org