[tor-commits] [tor-browser-spec/master] Update with more review comments from GK.

Mon Apr 28 15:18:48 UTC 2014

commit 2b512898eb5f264f597c0f48f6de3e83db0ae6f1
Author: Mike Perry <mikeperry-git at fscked.org>
Date:   Fri Mar 8 16:25:04 2013 -0800

    Update with more review comments from GK.
---
 docs/design/design.xml |  124 ++++++++++++++++++++++++++----------------------
 1 file changed, 67 insertions(+), 57 deletions(-)

diff --git a/docs/design/design.xml b/docs/design/design.xml
index f24fb40..1bb33aa 100644
--- a/docs/design/design.xml
+++ b/docs/design/design.xml
@@ -39,8 +39,8 @@ This document describes the <link linkend="adversary">adversary model</link>,
 <link linkend="DesignRequirements">design requirements</link>, and <link
 linkend="Implementation">implementation</link> <!-- <link
 linkend="Packaging">packaging</link> and <link linkend="Testing">testing
-procedures</link> --> of the Tor Browser. It is current as of Tor Browser 2.3.25-4
-and Torbutton 1.5.0.
+procedures</link> --> of the Tor Browser. It is current as of Tor Browser
+2.3.25-5 and Torbutton 1.5.1.
 
   </para>
   <para>
@@ -486,16 +486,6 @@ query a user's history to see if they have issued certain censored search
 queries, or visited censored sites.
      </para>
      </listitem>
-     <listitem><command>Location information</command>
-     <para>
-
-Location information such as timezone and locality can be useful for the
-adversary to determine if a user is in fact originating from one of the
-regions they are attempting to control, or to zero-in on the geographical
-location of a particular dissident or whistleblower.
-
-     </para>
-     </listitem>
      <listitem><command>Correlate activity across multiple sites</command>
      <para>
 
@@ -510,11 +500,12 @@ attempt to perform this correlation without the user's explicit consent.
      <para>
 
 Fingerprinting (more generally: "anonymity set reduction") is used to attempt
-to zero in on a particular individual without the use of tracking identifiers.
-If the dissident or whistleblower is using a rare build of Firefox for an
-obscure operating system, this can be very useful information for tracking
-them down, or at least <link linkend="fingerprinting">tracking their
-activities</link>.
+to gather identifying information on a particular individual without the use
+of tracking identifiers. If the dissident or whistleblower's timezone is
+available, and they are using a rare build of Firefox for an obscure operating
+system, and they have a specific display resolution only used on one type of
+laptop, this can be very useful information for tracking them down, or at
+least <link linkend="fingerprinting">tracking their activities</link>.
 
      </para>
      </listitem>
@@ -738,19 +729,23 @@ was formerly available only to Javascript.
      <para>
 
 Website traffic fingerprinting is an attempt by the adversary to recognize the
-encrypted traffic patterns of specific websites. The most comprehensive
-study of the statistical properties of this attack against Tor was done by
-<ulink
+encrypted traffic patterns of specific websites. In the case of Tor, this
+attack would take place between the user and the Guard node, or at the Guard
+node itself.
+     </para> 
+
+	 <para> The most comprehensive study of the statistical properties of this
+attack against Tor was done by <ulink
 url="http://lorre.uni.lu/~andriy/papers/acmccs-wpes11-fingerprinting.pdf">Panchenko
 et al</ulink>. Unfortunately, the publication bias in academia has encouraged
 the production of a number of follow-on attack papers claiming "improved"
-success rates, which are enabled primarily by taking a number of shortcuts
-(such as classifying only very small numbers of websites, neglecting to
-publish ROC curves or at least false positive rates, and/or omitting the
-effects of dataset size on their results). Despite these subsequent
-"improvements" (which in some cases amusingly claim to completely invalidate
-any attempt at defense), we are skeptical of the efficacy of this attack in a
-real world scenario, <emphasis>especially</emphasis> in the face of any
+success rates, in some cases even claiming to completely invalidate any
+attempt at defense. These "improvements" are actually enabled primarily by
+taking a number of shortcuts (such as classifying only very small numbers of
+web pages, neglecting to publish ROC curves or at least false positive rates,
+and/or omitting the effects of dataset size on their results). Despite these
+subsequent "improvements", we are skeptical of the efficacy of this attack in
+a real world scenario, <emphasis>especially</emphasis> in the face of any
 defenses.
 
      </para>
@@ -767,7 +762,7 @@ in your hypothesis space</ulink>. In fact, even for unbiased hypothesis
 spaces, the number of training examples required to achieve a reasonable error
 bound is <ulink
 url="https://en.wikipedia.org/wiki/Probably_approximately_correct_learning#Equivalence">a
-function of the number of categories</ulink> you need to classify.
+function of the complexity of the categories</ulink> you need to classify.
 
      </para>
       <para>
@@ -776,22 +771,27 @@ function of the number of categories</ulink> you need to classify.
 In the case of this attack, the key factors that increase the classification
 complexity (and thus hinder a real world adversary who attempts this attack)
 are large numbers of dynamically generated pages, partially cached content,
-and non-web activity in the "Open World" scenario of the entire Tor network.
-This large level of classification complexity is further confounded by a noisy
-and low resolution featureset, one which is also realtively easy for the
-defender to manipulate at low cost.
+and also the non-web activity of entire Tor network. This yields an effective
+number of "web pages" many orders of magnitude larger than even <ulink
+url="http://lorre.uni.lu/~andriy/papers/acmccs-wpes11-fingerprinting.pdf">Panchenko's
+"Open World" scenario</ulink>, which suffered continous near-constant decline
+in the true positive rate as the "Open World" size grew (see figure 4). This
+large level of classification complexity is further confounded by a noisy and
+low resolution featureset - one which is also realtively easy for the defender
+to manipulate at low cost.
 
      </para>
      <para>
 
 In fact, the ocean of Tor Internet activity (at least, when compared to a lab
-setting) makes it a certainty that an adversary attempting to classify a large
-number of sites with poor feature resolution will ultimately be overwhelmed by
-false positives. This problem is known in the IDS literature as the <ulink
+setting) makes it a certainty that an adversary attempting examine large
+amounts of Tor traffic will ultimately be overwhelmed by false positives (even
+after making heavy tradeoffs on the ROC curve to minimize false positives to
+below 0.01%). This problem is known in the IDS literature as the <ulink
 url="http://www.raid-symposium.org/raid99/PAPERS/Axelsson.pdf">Base Rate
 Fallacy</ulink>, and it is the primary reason that anomaly and activity
 classification-based IDS and antivirus systems have failed to materialize in
-the marketplace.
+the marketplace (despite early success in academic literature).
 
      </para>
      <para>
@@ -816,7 +816,9 @@ outside of the browser's ability to defend against, but it is worth mentioning
 for completeness. In fact, <ulink
 url="http://tails.boum.org/contribute/design/">The Tails system</ulink> can
 provide some defense against this adversary, and it does include the Tor
-Browser.
+Browser. We do however aim to defend against an adersary that has passive
+forensic access the disk after browsing activity takes place, as part of our
+<link linkend="disk-avoidance">Disk Avoidance</link> defenses.
 
      </para>
      </listitem>
@@ -1151,7 +1153,8 @@ with OCSP relying the cacheKey property for reuse of POST requests</ulink>, we
 had to <ulink
 url="https://gitweb.torproject.org/torbrowser.git/blob/maint-2.4:/src/current-patches/firefox/0004-Add-a-string-based-cacheKey.patch">patch
 Firefox to provide a cacheDomain cache attribute</ulink>. We use the fully
-qualified url bar domain as input to this field.
+qualified url bar domain as input to this field, to avoid the complexities
+of heuristically determining the second-level DNS name.
 
      </para>
      <para>
@@ -1162,7 +1165,7 @@ isolation scheme than the Stanford implementation. First, we decoupled the
 cache isolation from the third party cookie attribute. Second, we use several
 mechanisms to attempt to determine the actual location attribute of the
 top-level window (to obtain the url bar FQDN) used to load the page, as
-opposed to relying solely on the referer property.
+opposed to relying solely on the Referer property.
 
      </para>
      <para>
@@ -1305,7 +1308,7 @@ storage</ulink>.
 
 In order to eliminate non-consensual linkability but still allow for sites
 that utilize this property to function, we reset the window.name property of
-tabs in Torbutton every time we encounter a blank referer. This behavior
+tabs in Torbutton every time we encounter a blank Referer. This behavior
 allows window.name to persist for the duration of a click-driven navigation
 session, but as soon as the user enters a new URL or navigates between
 https/http schemes, the property is cleared.
@@ -1354,7 +1357,7 @@ Identity</command> invocations.
     <listitem>Exit node usage
      <para><command>Design Goal:</command>
 
-Every distinct navigation session (as defined by a non-blank referer header)
+Every distinct navigation session (as defined by a non-blank Referer header)
 MUST exit through a fresh Tor circuit in Tor Browser to prevent exit node
 observers from linking concurrent browsing activity.
 
@@ -1748,17 +1751,21 @@ url="https://developer.mozilla.org/en-US/docs/XPCOM_Interface_Reference/nsIDOMWi
 We then stop all page activity for each tab using <ulink
 url="https://developer.mozilla.org/en-US/docs/XPCOM_Interface_Reference/nsIWebNavigation#stop%28%29">browser.webNavigation.stop(nsIWebNavigation.STOP_ALL)</ulink>.
 We then clear the site-specific Zoom by temporarily disabling the preference
-<command>browser.zoom.siteSpecific</command>, and clear the GeoIP wiki token
-URL and the last opened URL prefs (if they exist). Each tab is then closed.
+<command>browser.zoom.siteSpecific</command>, and clear the GeoIP wifi token URL
+<command>geo.wifi.access_token</command> and the last opened URL prefs (if
+they exist). Each tab is then closed.
 
      </para>
      <para>
 
-After closing all tabs, we then clear the following state: searchbox and
-findbox text, HTTP auth, SSL state, OCSP state, site-specific content
-preferences (including HSTS state), content and image cache, Cookies, DOM
-storage, safe browsing key, and the Google wifi geolocation token (if it
-exists). 
+After closing all tabs, we then emit "<ulink
+url="https://developer.mozilla.org/en-US/docs/Supporting_private_browsing_mode#Private_browsing_notifications">browser:purge-session-history</ulink>"
+(which instructs addons and various Firefox components to clear their session
+state), and then manually clear the following state: searchbox and findbox
+text, HTTP auth, SSL state, OCSP state, site-specific content preferences
+(including HSTS state), content and image cache, offline cache, Cookies, DOM
+storage, DOM local storage, the safe browsing key, and the Google wifi geolocation
+token (if it exists). 
 
      </para>
      <para>
@@ -1769,7 +1776,7 @@ new circuit to be created.
      </para>
      <para>
 Finally, a fresh browser window is opened, and the current browser window is
-closed.
+closed (this does not spawn a new Firefox process, only a new window).
      </para>
     </blockquote>
     <blockquote>
@@ -1818,7 +1825,8 @@ encrypted website activity.
        <blockquote>
       <para>
 
-We want to deploy a mechanism that reduces the accuracy of features available
+We want to deploy a mechanism that reduces the accuracy of <ulink
+url="https://en.wikipedia.org/wiki/Feature_selection">useful features</ulink> available
 for classification. This mechanism would either impact the true and false
 positive accuracy rates, <emphasis>or</emphasis> reduce the number of webpages
 that could be classified at a given accuracy rate.
@@ -1840,7 +1848,7 @@ Padding</ulink> and <ulink url="http://www.cs.sunysb.edu/~xcai/fp.pdf">
 Congestion-Sensitive BUFLO</ulink>. It may be also possible to <ulink
 url="https://trac.torproject.org/projects/tor/ticket/7028">tune such
 defenses</ulink> such that they only use existing spare Guard bandwidth capacity in the Tor
-network.
+network, making them also effectively no-overhead.
 
      </para>
        </blockquote>
@@ -2059,7 +2067,9 @@ CSS and Javascript</ulink> and is a fingerprinting vector. This patch limits
 the number of times CSS and Javascript can cause font-family rules to
 evaluate. Remote @font-face fonts are exempt from the limits imposed by this
 patch, and remote fonts are given priority over local fonts whenever both
-appear in the same font-family rule.
+appear in the same font-family rule. We do this by explicitly altering the
+nsRuleNode rule represenation itself to remove the local font families before
+the rule hits the font renderer.
 
      </para>
     </listitem>
@@ -2576,11 +2586,11 @@ occurring.
   <listitem>The Referer Header
   <para>
 
-We haven't disabled or restricted the referer ourselves because of the
-non-trivial number of sites that rely on the referer header to "authenticate"
+We haven't disabled or restricted the Referer ourselves because of the
+non-trivial number of sites that rely on the Referer header to "authenticate"
 image requests and deep-link navigation on their sites. Furthermore, there
 seems to be no real privacy benefit to taking this action by itself in a
-vacuum, because many sites have begun encoding referer URL information into
+vacuum, because many sites have begun encoding Referer URL information into
 GET parameters when they need it to cross http to https scheme transitions.
 Google's +1 buttons are the best example of this activity.
 
@@ -2588,7 +2598,7 @@ Google's +1 buttons are the best example of this activity.
   <para>
 
 Because of the availability of these other explicit vectors, we believe the
-main risk of the referer header is through inadvertent and/or covert data
+main risk of the Referer header is through inadvertent and/or covert data
 leakage.  In fact, <ulink
 url="http://www2.research.att.com/~bala/papers/wosn09.pdf">a great deal of
 personal data</ulink> is inadvertently leaked to third parties through the
@@ -2601,7 +2611,7 @@ We believe the Referer header should be made explicit. If a site wishes to
 transmit its URL to third party content elements during load or during
 link-click, it should have to specify this as a property of the associated HTML
 tag. With an explicit property, it would then be possible for the user agent to
-inform the user if they are about to click on a link that will transmit referer
+inform the user if they are about to click on a link that will transmit Referer
 information (perhaps through something as subtle as a different color in the
 lower toolbar for the destination URL). This same UI notification can also be
 used for links with the <ulink