[tor-commits] [tor-browser-spec/master] Cleanups and fixes.

mikeperry at torproject.org mikeperry at torproject.org
Mon May 4 18:32:55 UTC 2015


commit 16ef0c1333762109fd5778dd1af4d537080c3e83
Author: Mike Perry <mikeperry-git at torproject.org>
Date:   Thu Apr 30 22:49:05 2015 -0700

    Cleanups and fixes.
---
 position-papers/HTTP3/HTTP3.tex |  240 ++++++++++++++++++++-------------------
 1 file changed, 125 insertions(+), 115 deletions(-)

diff --git a/position-papers/HTTP3/HTTP3.tex b/position-papers/HTTP3/HTTP3.tex
index ff08c33..b4d7993 100644
--- a/position-papers/HTTP3/HTTP3.tex
+++ b/position-papers/HTTP3/HTTP3.tex
@@ -19,6 +19,7 @@
 
 \title{The Future of HTTP and Anonymity on the Internet}
 
+% XXX: This is broken:
 \author{Georg Koppen \\ The Tor Project, Inc \\ georg at torproject.org}
 \author{Mike Perry \\ The Tor Project, Inc \\ mikeperry at torproject.org}
 
@@ -36,21 +37,25 @@ with the Tor Network, avoid introducing new third party tracking and
 linkability vectors, and minimize client fingerprintability. We also have a
 strong interest in the development of enhancements and/or extensions that
 protect the confidentiality and integrity of HTTP traffic, as well as provide
-resistance to traffic fingerprinting and general traffic analysis. We are
-presently actively researching these areas.
+resistance to traffic fingerprinting and general traffic analysis. In fact, we
+are presently researching these areas.
 
 \end{abstract}
 
 \section{Introduction}
 
-The Tor Project, Inc is a US non-profit dedicated to providing technology and
-education to support online privacy, anonymity, and censorship circumvention.
-Our primary products are the Tor network software, and the Tor Browser, which
-is based on Firefox. 
+The Tor Project is a United States 501(c)(3) non-profit dedicated to providing
+technology, research, and education to support online privacy, anonymity, and
+censorship circumvention. Our primary software products are the Tor network
+software, and the Tor Browser, which is based on Firefox. The Tor Project is
+actively collaborating with Mozilla to ensure that its modifications to
+Firefox are merged with the official Firefox distribution, with the long-term
+goal of providing an optional Tor-enabled mode of operation for native Firefox
+users.
 
 In this position paper, we describe the concerns and interests of the Tor
 Project with respect to future HTTP standardization. These concerns and
-interests span five areas: identifier linkability, connection usage linkability,
+interests span six areas: identifier linkability, connection usage linkability,
 fingerprinting linkability, traffic confidentiality and integrity, traffic
 fingerprinting and traffic analysis, and Tor network compatibility.
 
@@ -61,13 +66,13 @@ performing a more in-depth review of HTTP/2 for client fingerprinting and
 other tracking issues in the coming months as we modify the Firefox
 implementation for deployment in Tor Browser.
 
-\section{Identifier Linkability}
+\section{Identifier Linkability Concerns}
 
 Identifier linkability is the ability to use any form of browser state, cache,
 data storage, or identifier to track (or link) a user between two otherwise
-independent actions. For the purposes of this position paper, we concern
-ourselves with any browser state that persists beyond the duration or scope of
-a single connection.
+independent actions. For the purpose of this position paper, we are
+specifically concerned with any stateful information residing in the browser's
+HTTP layer that persists beyond the duration or scope of a single connection.
 
 For background, the Tor Project has designed Tor Browser with two main
 properties for limiting identifier-based tracking: First Party Isolation, and
@@ -75,20 +80,19 @@ Long Term Unlinkability.
 % FIXME: cite design doc
 
 First Party Isolation is the property that a user's actions at one
-top-level URL bar domain cannot be correlated or linked to their actions on a
+top-level URL bar domain must not be correlated or linked to their actions on a
 different top-level URL bar domain. We maintain this property through a number
 of patches and modifications to various aspects of browser functionality and
 state keeping.
 % FIXME: Cite design doc
 
-Long Term Unlinkability is the property that the user may completely clear all
-website-visible data and other identifiers associated, such that their future
-activity cannot be linked or correlated to any activity prior to this action.
-Tor Browser provides Long Term Unlinkability by allowing the user to clear all
-browser tracking data in a single click (called "New Identity"). Our long-term
-goal is to allow users to define their relationship with individual first
-parties, and alter that relationship by resetting or blocking the associated
-tracking data on a per-site basis.
+Long Term Unlinkability is the property that a user's future activity must not
+be linked or correlated to any prior activity after that user's explicit request
+to sever this link. Tor Browser provides Long Term Unlinkability by allowing
+the user to clear all browser tracking data in a single click (called "New
+Identity"). Our long-term goal is to allow users to define their relationship
+with individual first parties, and alter that relationship by resetting or
+blocking the associated tracking data on a per-site basis.
 % FIXME: Cite design doc
 
 \subsection{Identifier Linkability in HTTP/2}
@@ -96,17 +100,16 @@ tracking data on a per-site basis.
 The Tor Project is still in the process of evaluating the stateful nature of
 HTTP/2 connections and their associated streams and settings. It is likely
 that we will be able to isolate the usage of HTTP/2 connection state in a
-similar way to how we currently isolate HTTP connection state, as well as
-close these connections and clear that state when the user chooses to use a
-New Identity. However, it is not clear yet at this point how complicated this
-isolation will be.
+manner similar to the way we currently isolate HTTP/1.1 connection state.
+However, it is not clear yet at this point how complicated this isolation will
+be.
 
 \subsection{Avoiding Future Identifier Linkability}
 
 We feel that it is very important that mechanisms for identifier usage,
 storage, and connection-related state keeping be cleanly abstracted and
 narrowly scoped within the HTTP protocol. However, we also recognize that to a
-large degree identifier usage and the resulting linkability is primarily an
+large degree identifier storage and the resulting linkability is primarily an
 implementation detail, and not specific to the protocol itself.
 
 Identifier linkability will become a problem if instances arise where the
@@ -114,9 +117,10 @@ server is allowed to specify a setting or configuration property for a client
 that must persist beyond the duration of the session. In these cases, care
 must be taken to ensure that this data is cleared or isolated upon entry to
 private browsing mode, or when the user attempts to clear their private data.
+In the case of Tor Browser, we will most likely clear this state immediately
+upon connection close.
 
-
-\section{Connection Usage Linkability}
+\section{Connection Usage Linkability Concerns}
 
 Connection usage linkability arises from the use of the same underlying
 transport stream for requests that would otherwise be independent due to the
@@ -138,18 +142,22 @@ SHOULD NOT open more than one connection to a given host and port).
 \subsection{Avoiding Future Connection Usage Linkability}
 
 In the future, connection usage linkability may become a problem if the notion
-of a connection becomes further abstracted from the transport, and instead is
-enforced through a collection of identifiers or stateful behavior in the
-browser. This may tend to further encourage implementations that make it
-difficult to decouple the notion of a connection from the notion of a
-destination address.
+of a connection becomes disassociated from the application layer, and instead
+is enforced through a collection of identifiers or stateful behavior in the
+browser. This may tend to encourage implementations that make it difficult to
+decouple the notion of a connection from the notion of a destination address.
+
+Connection (and even identifier) linkability could similarly arise if
+implementations were required to remember which endpoints supported which HTTP
+versions, to avoid wasting round trips determining this information in-band.
 
-Even this is technically an implementation issue, but consideration should be
-taken to ensure that the specification does not encourage implementations to
-bake in deep assumptions about providing only a single connection instance per
-site, as was done for HTTP/2.
+Even these concerns are technically implementation issues, but consideration
+should be taken to ensure that the specification does not encourage
+implementations to bake in deep assumptions about providing only a single
+connection instance per site, as well as the need to remember a site's
+capabilities for long periods of time.
 
-\section{Fingerprinting Linkability}
+\section{Fingerprinting Linkability Concerns}
 
 User agent fingerprinting arises from four sources: end-user configuration
 details, device and hardware characteristics, operating system vendor and
@@ -184,7 +192,7 @@ delegated to the client, and depend on things like available system memory,
 available CPU cores, or other system details. Care should be taken to avoid
 these situations, but we also expect them to be unlikely.
 
-\section{Traffic Confidentiality and Integrity}
+\section{Traffic Confidentiality and Integrity Concerns}
 
 The Tor Project is very interested in any efforts to improve the
 confidentiality and integrity of the session layer of HTTP/3. 
@@ -196,18 +204,18 @@ authenticated encryption, as opposed to deploying opportunistic mechanisms.
 % FIXME: Cite Let's Encrypt
 
 We are also interested in efforts to encrypt the ClientHello and ServerHello
-messages using an initial forward-secure handshake, as described in the
-Encrypted TLS Handshake proposal. If SNI, ALPN, and the ServerHello can be
-encrypted using an ephemeral exchange that is authenticated later in the
-handshake, the adversary loses a great deal of information about the user's
-intended destination site. When large scale CDNs and multi-site load balancers
-are involved, the ultimate destination would be impossible to determine with
-this type of handshake in place. This will aid in defenses against traffic
+messages using an initial ephemeral handshake, as described in the Encrypted
+TLS Handshake proposal. If SNI, ALPN, and the ServerHello can be encrypted
+using an ephemeral key exchange that is authenticated later in the handshake,
+the adversary loses a great deal of information about the user's intended
+destination site. When large scale CDNs and multi-site load balancers are
+involved, the ultimate destination would be impossible to determine with this
+type of handshake in place. This will aid in defenses against traffic
 fingerprinting and traffic analysis, which we describe detail in the next
 section.
 % FIXME: Cite https://tools.ietf.org/html/draft-ray-tls-encrypted-handshake-00
 
-\section{Traffic Fingerprinting and Traffic Analysis}
+\section{Traffic Fingerprinting and Traffic Analysis Concerns}
 
 Website Traffic Fingerprinting is the process of using machine learning to
 classify web page visits based on their encrypted traffic patterns. It is most
@@ -217,33 +225,36 @@ visited.
 
 In the case of Tor, this attack is most commonly considered with respect to
 the client's connection to their Guard node (the entry into the Tor network).
-There, Tor's fixed 512 byte packet size, link encryption, and multiplexing all
-go a long way to impede this attack for minimal overhead. The fixed 512 byte
-packet size helps to obscure some amount of request and response length
-information. Tor's link encryption also conceals the destination website,
-which reduces classifier accuracy and capabilities, due largely to the Base
-Rate Fallacy.
-
-There was some initial controversy in the literature as to the exact degree to
-which the Base Rate Fallacy and other properties applied to website traffic
-fingerprinting of Tor traffic, but after publicly requesting that these
-effects be studied in closer detail, recent results have confirmed and
-quantified the benefits conferred by Tor's unique link encryption.
+There, Tor's fixed 512 byte packet size, link encryption, and stream
+multiplexing go a long way to impede this attack for minimal overhead. The
+fixed 512 byte packet size helps to obscure some amount of request and
+response length information. Tor's link encryption also conceals the
+destination website from the Guard node observer, which reduces classifier
+accuracy and capabilities by increasing the size of the classification domain.
+
+There was some initial controversy in the research literature as to the exact
+degree to which the classification domain size, the base rate fallacy, and
+other machine learning issues applied to website traffic fingerprinting of Tor
+traffic, but after publicly requesting that these effects be studied in closer
+detail, recent results have confirmed and quantified the benefits conferred by
+Tor's unique link encryption.
+
+Tor's link properties are by no means a complete defense, but they show that
+there is room to develop defenses that specifically aim to increase the size
+of the classification domain and associated base rate. In fact, it is our
+belief that minimal padding and clever use of request and response framing
+will increase the false positive rate enough to frustrate these attacks. For
+this reason, we have been encouraging continued study of low-overhead defenses
+against traffic fingerprinting. 
 % FIXME: Cite blog post and Mjaurez's paper
-
-For this reason, we have been encouraging continued study of low-overhead
-defenses against traffic fingerprinting. We are optimistic that clever use of
-request bundling and response chunking can be combined with minimal amounts of
-padding to significantly reduce the accuracy of this attack, even when the
-attack is combined with prior information that reduces the size of the
-classification domain.
 % FIXME: Cite design doc's website traffic fingerprinting section
 
 With the aid of an encrypted TLS handshake, we are hopeful that these defenses
-will also be applicable to non-Tor TLS sessions as well. The application of
-these defenses to HTTP's TLS layer will serve to increase the difficulty
-of end-to-end correlation and general traffic analysis of Tor Exit node
-traffic.
+will also be applicable to non-Tor TLS sessions as well. In addition to
+protecting the communications of non-Tor users from traffic fingerprinting,
+the application of these defenses to the HTTP TLS layer will serve to increase
+the difficulty of end-to-end correlation and general traffic analysis of Tor
+exit node traffic as well.
 
 \subsection{Traffic Analysis Issues with HTTP/2}
 
@@ -251,31 +262,30 @@ In our preliminary investigation of HTTP/2, we discovered that certain aspects
 of the protocol may aid certain types of traffic analysis attacks.
 
 In particular, the PING and SETTINGS frames are acknowledged immediately by
-the client which might give servers the option to collect information about a
-client via timing side-channels. They also allow the server to introduce an
-active traffic pattern that can be used for end-to-end correlation or
-confirmation, independent of client behavior.
-
-It is true that there are other means an attacker could use for the same
-purpose (such as redirects or Javascript), but these mechanisms can either be
-disabled by the user, reflected in UI activity, or otherwise mitigated by Tor
-Browser
-
-In Tor Browser, we will likely close the connection after receiving some rate
-of unsolicited PING or SETTINGS updates, and introduce delay or jitter before
-responding to these requests before that point. However, lack of explicit
-guidance in the specification about this issue raises concerns about what
-frequencies of these frames are likely to constitute attacks, or instead
-represent normal server behavior in the wild due to overly-aggressive HTTP/2
-implementations.
+the client, which might give servers the ability to collect information about a
+client's location and/or routing via timing side-channels. They also allow the
+server to introduce an active traffic pattern that can be used for end-to-end
+correlation or confirmation.
+
+In Tor Browser, we will likely introduce delay or jitter before responding to
+these requests, and close the connection after receiving some rate of
+unsolicited PING or SETTINGS updates. However, lack of explicit guidance in
+the specification about this issue raises concerns about what frequencies of
+these frames are likely to represent normal server behavior in the wild due to
+overly-aggressive HTTP/2 implementations, as opposed to actual attacks.
+
+It is true that there are other mechanisms that an attacker could use for the
+same purpose (such as redirects or Javascript), but these mechanisms can
+either be disabled by the user, discovered by UI indicators, or otherwise
+mitigated by Tor Browser.
 
 \subsection{Future Traffic Analysis Resistance Enhancements for HTTP/3}
 
 In terms of assisting traffic analysis defenses, we would like to see
 capabilities for larger amounts of per-frame padding, and more fine-grained
 client-side control over frame sizes. Unfortunately the 256 bytes of padding
-provided by HTTP/2 is likely to be inconsequential when combined with a 16K
-frame size.
+provided by HTTP/2 is likely to be inconsequential when combined with the
+minimum frame size the client can request (16 kilobytes).
 
 In combination with researchers at the University of Leuven, the Tor Project
 has also developed a protocol and prototype implementation for communicating
@@ -290,38 +300,38 @@ basis of new HTTP/3 frame commands for communicating more sophisticated (yet
 still traffic-bounded) padding schedules to HTTP/3 servers.
 
 
-\section{Tor Network Compatibility}
+\section{Tor Network Compatibility Concerns}
 
 Our final area of concern is continued compatibility of the Tor network with
-future versions of the HTTP protocol.
-
-It is our understanding that there is a desire for future versions of HTTP to
-move to a UDP transport layer so that reliability, congestion control, and
-client mobility will be more directly under control of the application layer.
-
-At present, the Tor Network is only capable of carrying TCP traffic. While we
-would like to support UDP traffic and indeed eventually transition the entire
-Tor network to our own datagram protocol with custom congestion and flow
-control, additional research is still needed to examine the anonymity
-implications associated with this transition. Our present estimate is that a
-full network transition to UDP is at least five years away.
-
+future versions of the HTTP protocol. It is our understanding that there is a
+desire for future versions of HTTP to move to a UDP transport layer so that
+reliability, congestion control, and client mobility will be more directly
+under control of the client user agent.
+
+At present, the Tor Network is only capable of carrying TCP traffic. While it
+will be possible to support the transit of UDP datagrams using our existing
+TCP overlay network without significant anonymity risks within a year's time
+or sooner, it is unlikely that this level of support will be sufficient to
+warrant the use of a finely-tuned UDP version of HTTP rather than a TCP
+variant.
+
+Long term, our goal is to transition the entire Tor network to our own
+datagram protocol with custom congestion and flow control to better support
+both native datagram transport and end-to-end flow control. However,
+additional research is still needed to examine the anonymity implications
+associated with this transition. Our present estimate is that a full network
+transition to UDP is at least five years away.
 % FIXME: Site Murdoch's UDP study
 
-While it will be technically possible to support the transit of UDP inside our
-existing TCP overlay network without significant anonymity risks within a
-year's time or sooner, it is unlikely that this level of support will be
-sufficient to warrant the use of a finely-tuned UDP version of HTTP rather
-than a TCP variant.
-
-We are also concerned that even with a full network transition to a datagram
+We are also concerned that even after a full network transition to a datagram
 transport, it is likely that the congestion, flow, and reliability control of
-a UDP version of HTTP/3 may still end up performing poorly over higher-latency
-overlay networks such as ours. We are especially interested in ensuring that
-overlay networks are taken in to account in the design of any UDP-based future
-versions of HTTP, and would also prefer to retain the ability to use future
-HTTP versions over TCP, should the UDP implementations prove sub-optimal for
-our use case.
+a UDP version of HTTP may still end up performing poorly over higher-latency
+overlay networks such as ours.
+
+For these reasons, we are especially interested in ensuring that overlay
+networks are taken into account in the design of any UDP-based future versions
+of HTTP, and also prefer to retain the ability to use future HTTP versions
+over TCP, should the UDP implementations prove sub-optimal for our use case.
 
 
 





More information about the tor-commits mailing list