[tor-commits] r24679: {projects} initial cleanups while reading also, it's 5 pages again (projects/articles/browser-privacy)

Roger Dingledine arma at torproject.org
Wed Apr 27 01:08:34 UTC 2011


Author: arma
Date: 2011-04-27 01:08:34 +0000 (Wed, 27 Apr 2011)
New Revision: 24679

Modified:
   projects/articles/browser-privacy/W3CIdentity.tex
Log:
initial cleanups while reading

also, it's 5 pages again


Modified: projects/articles/browser-privacy/W3CIdentity.tex
===================================================================
--- projects/articles/browser-privacy/W3CIdentity.tex	2011-04-26 15:06:46 UTC (rev 24678)
+++ projects/articles/browser-privacy/W3CIdentity.tex	2011-04-27 01:08:34 UTC (rev 24679)
@@ -31,12 +31,14 @@
 There is a huge disconnect between how users perceive their online presence
 and the reality of their relationship with the websites they visit. This
 position paper explores this disconnect and provides some recommendations for
-making the technical reality of the web match user perception, through both
-technical improvements as well as user interface cues. We frame the core
-technical problem as one of ``linkability'' -- the level of correlation
+making the technical reality of the web match user perception. %, through both
+%technical improvements as well as user interface cues.
+We frame the core
+technical problem as one of ``linkability''---the level of correlation
 between various online activities that the user naturally expects to be
 independent. We look to address the issue of unexpected linkability through
-both improvements to the web's origin model, as well as through user interface
+both technical improvements to the web's origin model, as well as through
+user interface
 cues about the set of accumulated identifiers that can be said to comprise
 a user's online identity.
 
@@ -50,11 +52,12 @@
 advertisement is relevant to the current activity, and if possible, relevant
 to the current user.
 
-The cost of this is that user privacy on the web is a nightmare. There is
+The cost of this incentive structure is that user privacy on the web is a
+nightmare. There is
 ubiquitous tracking, unseen partnership agreements and data exchange, and
 surreptitious attempts to uncover users' identities against their will and
 without their knowledge. This is not just happening in the dark, unseemly
-corners of the web. It is happening everywhere\cite{facebook-like}.
+corners of the web. It is happening everywhere~\cite{facebook-like}.
 
 The problem is that the revenue model of the web has incentivized companies to
 find ways to continue to track users against their will, even if those users
@@ -62,9 +65,9 @@
 Starting with the infamous ``Flash cookies'', we have progressed through a
 seemingly endless arms race of secondary identifiers and tracking information:
 visited history, cache, font and system data, desktop resolution, keystroke
-timing, and so on and so forth\cite{wsj-fingerprinting}.
+timing, and so on and so forth~\cite{wsj-fingerprinting}.
 
-These efforts have lead to an even wider disconnect between a user's
+These efforts have led to an even wider disconnect between users'
 perception of their privacy and the reality of their privacy. Users simply
 can't keep up with the ways they are being tracked.
 
@@ -87,23 +90,25 @@
 
 
 To this end, the rest of this document is structured as follows: First, we
-examine how the user perceives their privacy on the web, comparing the average
+examine how users perceive their privacy on the web, comparing the average
 user's perspective to what actually is happening technically behind the
-scenes, and noting the major disconnects. We then examine solutions attempting
-to bridge this disconnect from two different directions, corresponding to the
+scenes, and noting the major disconnects. We then examine solutions that
+bridge this disconnect from two different directions, corresponding to the
 two major sources of disconnect\footnotemark. The first direction is improving
 the linkability issues inherent with the multi-origin model of the web itself.
 The second direction is improving user cues and browser interface to suggest a
-coherent concept of identity to the user, which more accurately reflects the
+coherent concept of identity to the user by accurately reflecting the
 set of unique identifiers they have accumulated. Both of these directions can
 be pursued independently.
+% "the user" is not the thing that "accurately reflects". 'which' is a
+% klunky term here. -RD
 
 \footnotetext{We only consider implementations that involve privacy-by-design.
 Privacy-by-policy approaches such as Do Not Track will not be discussed.}
 
 \section{User Privacy on the Web}
 
-To properly examine the privacy problem, we must probe both the average user's
+To properly examine the privacy problem, we must probe both the average users'
 perception of what their ``web identity'' is, as well as the technical
 realities of web authentication and tracking.
 
@@ -111,12 +116,13 @@
 
 Instinctively, users define their privacy in terms of their identity: in terms
 of how they have interacted with a site in order to inform it of who they are.
-Typically, the user's perception of their identity on the web is usually a direct
+Typically, the users' perception of their identity on the web is usually a direct
 function of the identifiers used for strong authentication for particular sites.
 
 For example, users expect that logging in to Facebook creates a relationship
 in their browsers when facebook.com is present in the URL bar, but they are
-typically not aware that this also extends to their activity on other, arbitrary
+typically not aware that this relationship also extends to their activity
+on other, arbitrary
 sites that happen to include ``Like this on Facebook'' buttons or
 Facebook-sourced advertising content.
 
@@ -141,15 +147,16 @@
 Auth tokens, and client TLS certificates. However, this identifier-based
 approach breaks down quickly on the modern web. High-security websites are
 already using fingerprinting as an auxiliary second factor of
-authentication\cite{security-fingerprinting}, and online data aggregators
+authentication~\cite{security-fingerprinting}, and online data aggregators
 utilize everything they can to build complete portraits of users'
-identities\cite{tracking-identity}.
+identities~\cite{tracking-identity}.
 
 Despite what the user may believe, their actual web identity then is a
 superset of all the stored identifiers and authentication tokens used by the
 browser. It is the ability to link a user's activity in one instance to their
 activity in another instance, be it across time, or even on the very same page
 due to multiple content origins.
+% 'It'? Their web identity is the ability to link...? -RD
 
 Therefore, instead of viewing the user's identity as the sum of their
 identifiers, or as their relationship to individual websites, it is best to
@@ -159,12 +166,12 @@
 \subsection{User Privacy as Linkability}
 
 In terms of what the user actually expects, user privacy is more accurately
-modeled as the level of linkability between subsequent actions on the web, as
-opposed to the mere sum of their unique identifiers and authentication tokens.
+modeled as the level of linkability between subsequent actions on the web---not
+just the sum of her unique identifiers and authentication tokens.
 
 When privacy is expanded to cover all items that enable or substantially
 contribute to linkability, a lot more components of the browser are now in
-scope. We will briefly enumerate these components.
+scope. We will briefly enumerate these four categories of components.
 
 First, the obvious properties are found in the state of the browser: cookies,
 DOM storage, cache, cryptographic tokens and cryptographic state, and
@@ -192,7 +199,7 @@
 various browser properties that extend beyond any stored origin-related state.
 
 The Panopticlick project by the EFF provides us with exactly this
-metric\cite{panopticlick}. The researchers conducted a survey of volunteers
+metric~\cite{panopticlick}. The researchers conducted a survey of volunteers
 who were asked to visit an experiment page that harvested many of the above
 components. They then computed the Shannon entropy of the resulting
 distribution of each of several key attributes to determine how many bits of
@@ -201,7 +208,7 @@
 While not perfect\footnotemark, this metric allows us to prioritize our efforts
 on the
 components that have the most potential for linkability.
-
+%
 \footnotetext{In particular, the test does not take in all aspects of
 resolution information. It did not calculate the size of widgets, window
 decoration, or toolbar size. We believe these resolution-related properties
@@ -209,19 +216,20 @@
 measure clock offset and other time-based fingerprints. Furthermore, as new
 browser features are added, the experiment should be repeated to include
 them.}
-
-This metric also indicates that it is beneficial to standardize on
+%
+It also shows the benefits of standardizing on
 implementations of fingerprinting resistance where possible. More
 implementations using the same defenses means more users with similar
 fingerprints, which means less entropy in the metric. Similarly, uniform
-feature deployment leads to less entropy in the metric.
+feature deployment leads to less entropy.
 
 \section{Matching User Perception with Reality}
 
 For users to have privacy, and for private browsing modes to function, the
 relationship between a user and a site must be understood by that user.
-
-It is apparent that the user experiences disconnect with the technical
+%
+%It is apparent that
+Users experience disconnects with the technical
 realities of the web on two major fronts: the average user does not grasp the
 privacy implications of the multi-origin model, nor are they given a clear
 concept of browser identity to grasp the privacy implications of the union
@@ -236,7 +244,7 @@
 The current identifier origin model used by the web is fundamentally flawed
 when viewed from the perspective of meeting the expectations of the user.
 Unique, globally linkable identifiers can be transmitted for arbitrary content
-elements on any page, which can be sourced from anywhere without user
+elements on any page, and they can be sourced from anywhere without user
 interaction or awareness.
 
 However, the behavior of identifiers and linkable attributes can be improved
@@ -246,26 +254,26 @@
 content origin. Where linkability attributes exist, they should be obfuscated
 on a per-origin basis.
 
-An early relevant example of this idea is SafeCache\cite{safecache}.
+An early relevant example of this idea is SafeCache~\cite{safecache}.
 SafeCache seeks to reduce the ability for third-party content elements to use
 the cache to store identifiers. It does this by limiting the scope of the
-cache to the top-level origin in the URL bar. This has the effect that
-commonly sourced content elements are fetched and cached repeatedly, but this
+cache to the top-level origin in the URL bar. Commonly sourced content
+elements are then fetched and cached repeatedly, but this
 is necessary to prevent linkability: each of these content elements can be
 crafted to include an identifier unique to each user, thus tracking users who
 attempt to avoid tracking by clearing normal cookies.
 
 The Mozilla development wiki describes an origin model improvement for
 cookie transmission
-written by Dan Witte\cite{thirdparty}. Dan describes a new
+written by Dan Witte~\cite{thirdparty}. He describes a new
 dual-keyed origin for cookies, so that cookies would only be transmitted if
-they matched both the top-level origin and the third-party origin involved in
+they match both the top-level origin and the third-party origin involved in
 their creation. This approach would go a long way towards preventing implicit
 tracking across multiple websites, and has some interesting properties that
 make user interaction with content elements more explicitly tied to the
 current site.
 % XXXX I can't tell what this paragraph is supposed to mean. --RR
-
+%
 Similarly, one could imagine this two-level dual-keyed origin isolation being
 deployed to improve similar issues with DOM Storage and cryptographic tokens.
 
@@ -292,7 +300,7 @@
 amenable to improvement under this model. In particular, one can imagine
 per-origin plugin loading permissions, per-origin limits on the number of
 fonts that can be used, and randomized window-specific time offsets.
-
+%
 So, while these approaches are in fact useful for bringing the technical
 realities of the web closer to what the user assumes is happening, they must
 be deployed uniformly, with a consistent top-level origin restriction model.
@@ -305,9 +313,9 @@
 Even if the origin model of identifier transmission and other linkable
 attributes is altered uniformly to be more in line with what users expect, it
 is likely that the average user will still experience privacy benefits if the
-browser conveys the sum of all linkable information as a single, storable,
+browser conveys the sum of all linkable information as a single storable,
 mutable, and clearable user identity.
-
+%
 Providing this concept of identity to the user is also simpler than origin
 improvements, as it does not require extensive compatibility testing or
 standards coordination.
@@ -321,34 +329,34 @@
 The better UI appears to lead to less mode error (in which the user forgets
 whether
 private browsing is enabled) than other browsers' private browsing
-modes\cite{private-browsing}.
-% XXXX ‘mode error’?
+modes~\cite{private-browsing}.
 
-The Mozilla Weave project appears to be proposing an identity-oriented method
+The Mozilla Weave project~\cite{weave-manager} appears to be proposing
+an identity-oriented method
 of managing, syncing, and storing authentication tokens, and also has use
-cases described for multiple users of a single browser\cite{weave-manager}. It
+cases described for multiple users of a single browser. It
 is the closest idea on paper to what we envision as the way to bridge user
 assumptions with reality.
 
 We believe that the user interface of the browser should convey a sense of
 persistent identity prominently to the user in the form of a visual cue. This
 cue can either be an abstract image, graphic or theme (such as the user's
-choice of Firefox Persona\cite{firefox-personas}), or it can be a text area
+choice of Firefox Persona~\cite{firefox-personas}), or it can be a text area
 with the user's current favored pseudonym. This idea of identity should then
 be integrated with the browsing experience. Users should be able to click a
 button to get a clean slate for a new identity, and should be able to log into
 and out of password-protected stored identities, which would
-contain the entire state of the browser. This is the direction the Tor Project
-intends to head in with the Tor Browser Bundle\cite{not-to-toggle}.
-
+contain the entire state of the browser. The Tor Project is heading in
+this direction with the Tor Browser Bundle~\cite{not-to-toggle}.
+%
 To this user, the Private Browsing Mode would be no more than a special case
-of this identity UI - a special identity that they can trust not to store
+of this identity UI---a special identity that they can trust not to store
 browsing history information to disk. Such a UI also more explicitly captures
 what is going on with respect to the user's relationship to the web.
 
 However, all current private browsing modes fall short of protecting against a
 network-level adversary and fail to deal with linkability against such an
-adversary\cite{private-browsing}, claiming that it is outside their threat
+adversary~\cite{private-browsing}, claiming that it is outside their threat
 model\footnotemark. If the user is given a new identity that is still linkable
 to the previous one due to shortcomings of the browser, this approach has
 failed as a privacy measure.
@@ -358,10 +366,11 @@
 network-level
 adversary is IP-address linkability. However, we believe this to be a red
 herring. Users are quite capable of using alternate Internet connections, and
-it is common practice for ISPs in many parts of the world to rotate user IP
+it is common practice for ISPs (especially cellular IP networks)
+%in many parts of the world
+to rotate user IP
 addresses daily, to discourage users from operating servers and to impede the
-spread of malware.
-This is especially true of cellular IP networks.}
+spread of malware.}
 
 Linkability solutions within the identity framework would be similar to the
 origin model solutions, except they would be properties of the entire browser
@@ -369,18 +378,19 @@
 
 \section{Conclusions}
 
-The appeal of the prevailing revenue model of the web and the difficulties
+The appeal of the web's prevailing revenue model and the difficulties
 associated with altering browser behavior have lulled us into accepting user
 deception as the norm for web use. The average user completely lacks the
 understanding needed to grasp how web tracking is carried out. This disconnect
-in understanding is extreme to the point where moral issues arise about the
+%in understanding
+is extreme to the point where moral issues arise about the
 level of consent actually involved in web use and associated tracking.
 
 In fact, standardization efforts seemed to realize this problem early on but
 failed to create a feasible recommendations for improving the situation. RFC
 2965 governing HTTP State Management mandated in section 3.3.6 that
 third-party origins must not cause the browser to transmit cookies unless the
-interaction is ``verifiable'' and readily apparent to the user\cite{rfc2965}.
+interaction is ``verifiable'' and readily apparent to the user~\cite{rfc2965}.
 In section 6, it also strongly suggested that informed consent and user
 control should govern the interaction of users to tracking identifiers.
 
@@ -396,7 +406,8 @@
 
 \bibliographystyle{plain} \bibliography{W3CIdentity}
 
-\clearpage
-\appendix
+%\clearpage
+%\appendix
 
 \end{document}
+



More information about the tor-commits mailing list