Author: mikeperry
Date: 2011-04-27 05:09:51 +0000 (Wed, 27 Apr 2011)
New Revision: 24681
Modified:
projects/articles/browser-privacy/W3CIdentity.tex
Log:
Rework and reorganize to argue for both identity-isolation
and origin model improvements.
Modified: projects/articles/browser-privacy/W3CIdentity.tex
===================================================================
--- projects/articles/browser-privacy/W3CIdentity.tex 2011-04-27 01:30:49 UTC (rev 24680)
+++ projects/articles/browser-privacy/W3CIdentity.tex 2011-04-27 05:09:51 UTC (rev 24681)
@@ -30,17 +30,16 @@
There is a huge disconnect between how users perceive their online presence
and the reality of their relationship with the websites they visit. This
-position paper explores this disconnect and provides some recommendations for
-making the technical reality of the web match user perception. %, through both
-%technical improvements as well as user interface cues.
-We frame the core
-technical problem as one of ``linkability''---the level of correlation
+position paper explores this disconnect and provides recommendations for
+making the technical reality of the web match user perception. We frame the
+core technical problem as one of ``linkability''---the level of correlation
between various online activities that the user naturally expects to be
independent. We look to address the issue of unexpected linkability through
-both technical improvements to the web's origin model, as well as through
-user interface
-cues about the set of accumulated identifiers that can be said to comprise
-a user's online identity.
+both technical improvements to the web's origin model, as well as through user
+interface cues about the set of accumulated identifiers that can be said to
+comprise a user's online identity. We argue that without approaching the
+problem from both of these directions, true privacy-by-design is simply not a
+possibility for average web users.
\end{abstract}
@@ -79,16 +78,6 @@
quo, we must explore the disconnect between user experience and the way the
web actually functions with respect to user tracking.
-%
-% 20:16 < nickm> Not "identity-based", though. identity-separation,
-% identity-isolation. "nym" and "pseudonym" are also fine words
-% 20:18 < armadev> i'm still not entirely clear on what you mean by the
-% identity model. i am guessing it's "the user thinks of his web
-% experience in terms of whether the website can recognize
-% him", but i think that's not it. i want clearer
-% definitions up front, and then i can help with terms. :)
-
-
To this end, the rest of this document is structured as follows: First, we
examine how users perceive their privacy on the web, comparing the average
user's perspective to what actually is happening technically behind the
@@ -98,8 +87,9 @@
the linkability issues inherent with the multi-origin model of the web itself.
The second direction is improving user cues and browser interface to suggest a
coherent concept of identity to users by accurately reflecting the
-set of unique identifiers they have accumulated. Both of these directions can
-be pursued independently.
+set of unique identifiers they have accumulated. Both of these directions must
+be pursued to provide users with the ability to properly use the web in a
+privacy-preserving way.
\footnotetext{We only consider implementations that involve privacy-by-design.
Privacy-by-policy approaches such as Do Not Track will not be discussed.}
@@ -122,7 +112,7 @@
typically not aware that this relationship also extends to their activity
on other, arbitrary
sites that happen to include ``Like this on Facebook'' buttons or
-Facebook-sourced advertising content.
+Facebook-sourced advertising content~\cite{facebook-like}.
Many, if not most, users expect that when they log out of a site, their
relationship ends and that any associated tracking should be over. Even users
@@ -224,119 +214,24 @@
For users to have privacy, and for private browsing modes to function, the
relationship between a user and a site must be understood by that user.
-%
-%It is apparent that
-Users experience disconnects with the technical
-realities of the web on two major fronts: the average user does not grasp the
-privacy implications of the multi-origin model, nor is she given a clear
-concept of browser identity to grasp the privacy implications of the union
-of the linkable components of her browser.
-We will now examine examples of attempts at reducing this disconnect on each
-of these two fronts. Note that these two fronts are orthogonal. Approaches from
-them may be combined, or used independently.
+Users experience disconnects with the technical realities of the web on two
+major fronts: the average user is not given a clear concept of browser
+identity to grasp the privacy implications of the union of the linkable
+components of her browser, nor does she grasp the privacy implications of the
+multi-origin model and how identifiers are transmitted under this model.
-\subsection{Improving the Origin Model}
+Both of these areas of disconnect must be fully addressed in order for the
+average user to have the technical level of privacy that they intuitively
+expect.
-The current identifier origin model used by the web is fundamentally flawed
-when viewed from the perspective of meeting the expectations of the user.
-Unique, globally linkable identifiers can be transmitted for arbitrary content
-elements on any page, and they can be sourced from anywhere without user
-interaction or awareness.
-
-However, the behavior of identifiers and linkable attributes can be improved
-to make linkability less implicit and more consent-driven without the need for
-cumbersome interventionist user interface. Where explicit identifiers exist,
-they should be tied to the pair of the top-level origin and the third-party
-content origin. Where linkability attributes exist, they should be obfuscated
-on a per-origin basis.
-
-An early relevant example of this idea is SafeCache~\cite{safecache}.
-SafeCache seeks to reduce the ability for third-party content elements to use
-the cache to store identifiers. It does this by limiting the scope of the
-cache to the top-level origin in the URL bar. Commonly sourced content
-elements are then fetched and cached repeatedly, but this
-is necessary to prevent linkability: each of these content elements can be
-crafted to include an identifier unique to each user, thus tracking even
-users who
-attempt to avoid tracking by clearing normal cookies.
-
-The Mozilla development wiki describes an origin model improvement for
-cookie transmission
-written by Dan Witte~\cite{thirdparty}. He describes a new
-dual-keyed origin for cookies, so that cookies would only be transmitted if
-they match both the top-level origin and the third-party origin involved in
-their creation. This approach would go a long way towards preventing implicit
-tracking across multiple websites, and has some interesting properties that
-make user interaction with content elements more explicitly tied to the
-current site.
-% XXXX I can't tell what this paragraph is supposed to mean. --RR
-%
-Similarly, one could imagine this two-level dual-keyed origin isolation being
-deployed to improve similar issues with DOM Storage and cryptographic tokens.
-
-Making the origin model for browser identifiers more closely match user
-activity and user expectation has other advantages as well. With a clear
-distinction between third-party and top-level cookies due to double-keying, the
-privacy settings window could have a user-intuitive way of representing the
-user's relationship with different origins, perhaps by using only the `favicon'
-of that top level origin to represent all of the browser state accumulated by
-that origin. The user could delete the entire set of browser state (cookies,
-cache, storage, cryptographic tokens) associated with a site simply by
-removing its favicon from her privacy info panel.
-
-The problem with origin model improvement approaches is that individually,
-they do not fully address the linkability problem unless the same
-restriction is applied uniformly to all aspects of stored browser state, and
-all other linkability issues are dealt with. Behind-the-scenes partnerships
-can easily allow companies to continue to link users to their identities
-through any linkable aspect of browser state that is not properly
-compartmentalized to the top level origin and bound to the same rules as all
-other linkable state.
-
-However, linkability based on fingerprintable browser properties is also
-amenable to improvement under this model. In particular, one can imagine
-per-origin plugin loading permissions, per-origin limits on the number of
-fonts that can be used, and randomized window-specific time offsets.
-%
-So, while these approaches are in fact useful for bringing the technical
-realities of the web closer to what the user assumes is happening, they must
-be deployed uniformly, with a consistent top-level origin restriction model.
-Uniform deployment will take significant coordination and standardization
-efforts. Until then,
-it is necessary to fill the remaining linkability gaps by presenting
-the user with a visual representation of her overall web identity.
-
\subsection{Conveying Identity to the User}
-Even if the origin model of identifier transmission and other linkable
-attributes is altered uniformly to be more in line with what users expect,
-the average user can still experience privacy benefits if the
-browser conveys the sum of all linkable information as a single storable,
-mutable, and clearable user identity.
-%
-Providing this concept of identity to the user is also simpler than origin
-improvements, as it does not require extensive compatibility testing or
-standards coordination.
+The first major disconnect that prevents users from achieving true
+privacy-by-design is that the user interface of most browsers does not provide
+any clearly visible cues to the user to indicate that their current set of
+accumulated linkable state comprise a single, trackable web identity.
-Of the major private browsing modes, Google Chrome's Incognito Mode comes the
-closest to conveying the idea of ``identity'' to the user, and its
-implementation is also simple as a result. The Incognito Mode window is a
-separate, stylized window which clearly conveys that an alternate identity
-is in use
-in this window, which can be used concurrently with the non-private identity.
-The better UI appears to lead to less mode error (in which the user forgets
-whether
-private browsing is enabled) than other browsers' private browsing
-modes~\cite{private-browsing}.
-
-The Mozilla Weave project~\cite{weave-manager} appears to be proposing
-an identity-oriented method
-of managing, syncing, and storing authentication tokens, and also has use
-cases described for multiple users of a single browser. It
-is the closest idea on paper to what we envision as the way to bridge user
-assumptions with reality.
-
We believe that the user interface of the browser should convey a sense of
persistent identity prominently to the user in the form of a visual cue. This
cue can either be an abstract image, graphic or theme (such as the user's
@@ -346,7 +241,7 @@
button to get a clean slate for a new identity, and should be able to log into
and out of password-protected stored identities, which would
contain the entire state of the browser.
-%
+
To this user, the Private Browsing Mode would be no more than a special case
of this identity UI---a special identity that they can trust not to store
browsing history information to disk. Such a UI also more explicitly captures
@@ -354,30 +249,135 @@
The Tor Project is heading in this direction with the Tor Browser
Bundle~\cite{not-to-toggle}.
-Unfortunately, all current private browsing modes fall short of protecting
-against a
-network-level adversary and fail to deal with linkability against such an
-adversary~\cite{private-browsing}, claiming that it is outside their threat
+Of the major private browsing modes, Google Chrome's Incognito Mode comes the
+closest to conveying this idea of ``identity'' to the user, and its
+implementation is also simple as a result. The Incognito Mode window is a
+separate, stylized window which clearly conveys that an alternate identity
+is in use
+in this window, which can be used concurrently with the non-private identity.
+The better UI appears to lead to less mode error (in which the user forgets
+whether
+private browsing is enabled) than other browsers' private browsing
+modes~\cite{private-browsing}.
+
+The Mozilla Weave project~\cite{weave-manager} appears to be proposing an
+identity-oriented method of managing, syncing, and storing authentication
+tokens, and also has use cases described for multiple users of a single
+browser. It is the closest idea on paper to what we envision as the way to
+bridge user assumptions with reality.
+
+Unfortunately, all current private browsing modes protect only against
+adversaries with access to the local computer and fail to deal with
+linkability against a network adversary (such as advertising
+networks)~\cite{private-browsing}, claiming that it is outside their threat
model\footnotemark. If the user is given a new identity that is still linkable
-to the previous one due to shortcomings of the browser, this approach has
+to the previous one due to shortcomings of the browser, the approach has
failed as a privacy measure.
-% XXXX Define network-level adversary.
-% I agree. You could do that here or in the footnote. -RD
-\footnotetext{The primary reason given to abstain from addressing a
-network-level
-adversary is IP-address linkability. However, we believe this to be a red
+\footnotetext{The primary reason given to abstain from addressing a network
+adversary is IP-address linkability. However, we believe this argument to be a red
herring. Users are quite capable of using alternate Internet connections, and
it is common practice for ISPs (especially cellular IP networks)
-%in many parts of the world
-to rotate user IP
-addresses daily, to discourage users from operating servers and to impede the
-spread of malware.}
+to rotate user IP addresses daily, to discourage users from operating servers
+and to impede the spread of malware.}
-Linkability solutions within the identity framework would be similar to the
-origin model solutions, except they would be properties of the entire browser
-or browser profile, and would be obfuscated only once per identity switch.
+Therefore, in addition to isolating explicit disk-based identifiers and
+browser state, an attempt should be made to obfuscate or alter the biggest
+culprits in terms of the entropy linkability metric mentioned in Section~2.4.
+However, not all linkability sources have viable solutions under an
+identity-isolation approach, and moreover, identity-isolation approaches fail
+to protect the user against linkability due to ubiquitous third party content
+elements that track them across nearly all sites as soon as they log into any
+one site~\cite{facebook-like}.
+
+\subsection{Improving the Origin Model}
+
+The other primary source of disconnect between user expectations and reality
+on the web is the origin model that governs cookie and other identifier
+transmission. The model allows unique, globally linkable identifiers to be
+transmitted for arbitrary content elements on any page, and they can be
+sourced from anywhere without user interaction or awareness. This property
+enables popular advertising and content distribution networks to have
+near-omniscient visibility into all user activity retroactively after any
+level of authentication takes place with a cooperating partner site.
+
+This identifier transmission model is fundamentally flawed when viewed from
+the perspective of meeting the expectations of the user.
+
+So far, industry has resisted changes to the identifier transmission model due
+to both inertia and compatibility concerns. However, the disconnect is so
+severe and the associated tracking is so pervasive that some level of
+temporary breakage must be tolerated to improve the status quo. Because of the
+retroactive nature of the linkability of cookies and other identifier storage,
+and because of the invisible and pervasive nature of these partnerships,
+privacy-by-design is essentially impossible to provide to the average user
+without addressing this issue.
+
+However, the behavior of identifiers and linkable attributes can be improved
+to make linkability less implicit and more consent-driven without the need for
+cumbersome interventionist user interface, and with minimal damage to existing
+content. Where explicit identifiers exist, they should be tied to the pair of
+the top-level origin and the third-party content origin. Where linkability
+attributes exist, they can be obfuscated on a per-origin basis.
+
+The work done by the Stanford Applied Crypto Group shows that it is relatively
+straight-forward to isolate the browser cache to specific top-level origins,
+effectively binding identifiers hidden in cached elements to the pair of
+top-level and third-party origin~\cite{safecache}. Commonly sourced
+third-party content elements are then fetched and cached repeatedly, but this
+is necessary to prevent linkability: each of these content elements can be
+crafted to include an identifier unique to each user, thus tracking even users
+who attempt to avoid tracking by clearing normal cookies.
+
+The Stanford group correctly observed that the problem with origin model
+improvements is that individually, they do not fully address the linkability
+problem unless the same restriction is applied uniformly to all aspects of
+stored browser state, and all other linkability issues are dealt with.
+Behind-the-scenes partnerships can easily allow companies to continue to link
+users to their identities through any linkable aspect of browser state that is
+not properly compartmentalized to the top level origin and bound to the same
+rules as all other linkable state.
+
+Along these lines, the Mozilla development wiki describes an origin model
+improvement for cookie transmission written by Dan Witte~\cite{thirdparty}. He
+describes applying this same dual-keyed origin to cookies, so that cookies
+would only be transmitted if they match both the top-level origin and the
+third-party origin involved in their creation. Dan observed minimal breakage
+to popular sites, and where breakage did occur, alternative approaches that
+did not violate the new model were readily available to web designers and
+often already in use.
+
+Similarly, one could imagine this two-level dual-keyed origin isolation being
+deployed to improve similar issues with DOM Storage and cryptographic tokens.
+This dual-origin policy should be considered a must for all future
+origin-bound identifiers.
+
+With a clear association between third-party cookies and their top-level
+origin due to double-keying, it becomes easier to provide the user with more
+intuitive control over site identifiers, and thus with more control over their
+actual relationship to particular sites. For example, the privacy settings
+window could have a user-intuitive way of representing the user's relationship
+with different origins, perhaps by using only the `favicon' of that top level
+origin to represent all of the browser state accumulated by that origin. The
+user could delete the entire set of browser state (cookies, cache, storage,
+cryptographic tokens) associated with a site simply by removing its favicon
+from her privacy info panel.
+
+Linkability based on fingerprintable browser properties is also amenable to
+improvement under this model. In particular, one can imagine per-origin plugin
+loading permissions, per-origin limits on the number of fonts that can be
+used, and randomized window-specific time offsets.
+
+While these approaches are in fact useful for bringing the technical realities
+of the web closer to what the user assumes is happening, they must be deployed
+uniformly, with a consistent top-level origin restriction model. Uniform
+deployment will take significant coordination and standardization efforts.
+Furthermore, even an vastly improved origin model still cannot prevent
+instances of explicit tracking partnerships between sites and third-party
+content providers. Therefore, both origin improvements as well as
+identity-isolation approaches are necessary.
+
\section{Conclusions}
The appeal of the web's prevailing revenue model and the difficulties
@@ -395,17 +395,13 @@
In Section~6, it also strongly suggested that informed consent and user
control should govern the interaction of users to tracking identifiers.
-Without changes to browser behavior, browser interface, or both, such informed
-consent is simply not possible on today's web. Several examples from academia
-and practice show that it is possible to bridge this disconnect by addressing
-the linkability issues with the web's origin model with minimal breakage.
-Additionally, the first steps towards providing the user with an explicit
-representation of their web identity have been taken.
-% Taken by who? Change to active tense. -RD
+Without changes to both browser behavior and browser interface, such informed
+consent is simply not possible on today's web. The lack of informed consent
+makes it impossible to expect privacy-by-design approaches to function
+properly. Users who do not even understand the basic properties of the
+tracking mechanisms they are subjected to cannot be expected to effectively
+use privacy mechanisms to avoid, opt out of, or decline such tracking.
-The pieces are in place to build robust private browsing modes based on these
-two approaches, and metrics exist to measure their success.
-
\bibliographystyle{plain} \bibliography{W3CIdentity}
%\clearpage