Author: mikeperry Date: 2011-04-27 05:09:51 +0000 (Wed, 27 Apr 2011) New Revision: 24681
Modified: projects/articles/browser-privacy/W3CIdentity.tex Log: Rework and reorganize to argue for both identity-isolation and origin model improvements.
Modified: projects/articles/browser-privacy/W3CIdentity.tex =================================================================== --- projects/articles/browser-privacy/W3CIdentity.tex 2011-04-27 01:30:49 UTC (rev 24680) +++ projects/articles/browser-privacy/W3CIdentity.tex 2011-04-27 05:09:51 UTC (rev 24681) @@ -30,17 +30,16 @@
There is a huge disconnect between how users perceive their online presence and the reality of their relationship with the websites they visit. This -position paper explores this disconnect and provides some recommendations for -making the technical reality of the web match user perception. %, through both -%technical improvements as well as user interface cues. -We frame the core -technical problem as one of ``linkability''---the level of correlation +position paper explores this disconnect and provides recommendations for +making the technical reality of the web match user perception. We frame the +core technical problem as one of ``linkability''---the level of correlation between various online activities that the user naturally expects to be independent. We look to address the issue of unexpected linkability through -both technical improvements to the web's origin model, as well as through -user interface -cues about the set of accumulated identifiers that can be said to comprise -a user's online identity. +both technical improvements to the web's origin model, as well as through user +interface cues about the set of accumulated identifiers that can be said to +comprise a user's online identity. We argue that without approaching the +problem from both of these directions, true privacy-by-design is simply not a +possibility for average web users.
\end{abstract}
@@ -79,16 +78,6 @@ quo, we must explore the disconnect between user experience and the way the web actually functions with respect to user tracking.
-% -% 20:16 < nickm> Not "identity-based", though. identity-separation, -% identity-isolation. "nym" and "pseudonym" are also fine words -% 20:18 < armadev> i'm still not entirely clear on what you mean by the -% identity model. i am guessing it's "the user thinks of his web -% experience in terms of whether the website can recognize -% him", but i think that's not it. i want clearer -% definitions up front, and then i can help with terms. :) - - To this end, the rest of this document is structured as follows: First, we examine how users perceive their privacy on the web, comparing the average user's perspective to what actually is happening technically behind the @@ -98,8 +87,9 @@ the linkability issues inherent with the multi-origin model of the web itself. The second direction is improving user cues and browser interface to suggest a coherent concept of identity to users by accurately reflecting the -set of unique identifiers they have accumulated. Both of these directions can -be pursued independently. +set of unique identifiers they have accumulated. Both of these directions must +be pursued to provide users with the ability to properly use the web in a +privacy-preserving way.
\footnotetext{We only consider implementations that involve privacy-by-design. Privacy-by-policy approaches such as Do Not Track will not be discussed.} @@ -122,7 +112,7 @@ typically not aware that this relationship also extends to their activity on other, arbitrary sites that happen to include ``Like this on Facebook'' buttons or -Facebook-sourced advertising content. +Facebook-sourced advertising content~\cite{facebook-like}.
Many, if not most, users expect that when they log out of a site, their relationship ends and that any associated tracking should be over. Even users @@ -224,119 +214,24 @@
For users to have privacy, and for private browsing modes to function, the relationship between a user and a site must be understood by that user. -% -%It is apparent that -Users experience disconnects with the technical -realities of the web on two major fronts: the average user does not grasp the -privacy implications of the multi-origin model, nor is she given a clear -concept of browser identity to grasp the privacy implications of the union -of the linkable components of her browser.
-We will now examine examples of attempts at reducing this disconnect on each -of these two fronts. Note that these two fronts are orthogonal. Approaches from -them may be combined, or used independently. +Users experience disconnects with the technical realities of the web on two +major fronts: the average user is not given a clear concept of browser +identity to grasp the privacy implications of the union of the linkable +components of her browser, nor does she grasp the privacy implications of the +multi-origin model and how identifiers are transmitted under this model.
-\subsection{Improving the Origin Model} +Both of these areas of disconnect must be fully addressed in order for the +average user to have the technical level of privacy that they intuitively +expect.
-The current identifier origin model used by the web is fundamentally flawed -when viewed from the perspective of meeting the expectations of the user. -Unique, globally linkable identifiers can be transmitted for arbitrary content -elements on any page, and they can be sourced from anywhere without user -interaction or awareness. - -However, the behavior of identifiers and linkable attributes can be improved -to make linkability less implicit and more consent-driven without the need for -cumbersome interventionist user interface. Where explicit identifiers exist, -they should be tied to the pair of the top-level origin and the third-party -content origin. Where linkability attributes exist, they should be obfuscated -on a per-origin basis. - -An early relevant example of this idea is SafeCache~\cite{safecache}. -SafeCache seeks to reduce the ability for third-party content elements to use -the cache to store identifiers. It does this by limiting the scope of the -cache to the top-level origin in the URL bar. Commonly sourced content -elements are then fetched and cached repeatedly, but this -is necessary to prevent linkability: each of these content elements can be -crafted to include an identifier unique to each user, thus tracking even -users who -attempt to avoid tracking by clearing normal cookies. - -The Mozilla development wiki describes an origin model improvement for -cookie transmission -written by Dan Witte~\cite{thirdparty}. He describes a new -dual-keyed origin for cookies, so that cookies would only be transmitted if -they match both the top-level origin and the third-party origin involved in -their creation. This approach would go a long way towards preventing implicit -tracking across multiple websites, and has some interesting properties that -make user interaction with content elements more explicitly tied to the -current site. -% XXXX I can't tell what this paragraph is supposed to mean. --RR -% -Similarly, one could imagine this two-level dual-keyed origin isolation being -deployed to improve similar issues with DOM Storage and cryptographic tokens. - -Making the origin model for browser identifiers more closely match user -activity and user expectation has other advantages as well. With a clear -distinction between third-party and top-level cookies due to double-keying, the -privacy settings window could have a user-intuitive way of representing the -user's relationship with different origins, perhaps by using only the `favicon' -of that top level origin to represent all of the browser state accumulated by -that origin. The user could delete the entire set of browser state (cookies, -cache, storage, cryptographic tokens) associated with a site simply by -removing its favicon from her privacy info panel. - -The problem with origin model improvement approaches is that individually, -they do not fully address the linkability problem unless the same -restriction is applied uniformly to all aspects of stored browser state, and -all other linkability issues are dealt with. Behind-the-scenes partnerships -can easily allow companies to continue to link users to their identities -through any linkable aspect of browser state that is not properly -compartmentalized to the top level origin and bound to the same rules as all -other linkable state. - -However, linkability based on fingerprintable browser properties is also -amenable to improvement under this model. In particular, one can imagine -per-origin plugin loading permissions, per-origin limits on the number of -fonts that can be used, and randomized window-specific time offsets. -% -So, while these approaches are in fact useful for bringing the technical -realities of the web closer to what the user assumes is happening, they must -be deployed uniformly, with a consistent top-level origin restriction model. -Uniform deployment will take significant coordination and standardization -efforts. Until then, -it is necessary to fill the remaining linkability gaps by presenting -the user with a visual representation of her overall web identity. - \subsection{Conveying Identity to the User}
-Even if the origin model of identifier transmission and other linkable -attributes is altered uniformly to be more in line with what users expect, -the average user can still experience privacy benefits if the -browser conveys the sum of all linkable information as a single storable, -mutable, and clearable user identity. -% -Providing this concept of identity to the user is also simpler than origin -improvements, as it does not require extensive compatibility testing or -standards coordination. +The first major disconnect that prevents users from achieving true +privacy-by-design is that the user interface of most browsers does not provide +any clearly visible cues to the user to indicate that their current set of +accumulated linkable state comprise a single, trackable web identity.
-Of the major private browsing modes, Google Chrome's Incognito Mode comes the -closest to conveying the idea of ``identity'' to the user, and its -implementation is also simple as a result. The Incognito Mode window is a -separate, stylized window which clearly conveys that an alternate identity -is in use -in this window, which can be used concurrently with the non-private identity. -The better UI appears to lead to less mode error (in which the user forgets -whether -private browsing is enabled) than other browsers' private browsing -modes~\cite{private-browsing}. - -The Mozilla Weave project~\cite{weave-manager} appears to be proposing -an identity-oriented method -of managing, syncing, and storing authentication tokens, and also has use -cases described for multiple users of a single browser. It -is the closest idea on paper to what we envision as the way to bridge user -assumptions with reality. - We believe that the user interface of the browser should convey a sense of persistent identity prominently to the user in the form of a visual cue. This cue can either be an abstract image, graphic or theme (such as the user's @@ -346,7 +241,7 @@ button to get a clean slate for a new identity, and should be able to log into and out of password-protected stored identities, which would contain the entire state of the browser. -% + To this user, the Private Browsing Mode would be no more than a special case of this identity UI---a special identity that they can trust not to store browsing history information to disk. Such a UI also more explicitly captures @@ -354,30 +249,135 @@ The Tor Project is heading in this direction with the Tor Browser Bundle~\cite{not-to-toggle}.
-Unfortunately, all current private browsing modes fall short of protecting -against a -network-level adversary and fail to deal with linkability against such an -adversary~\cite{private-browsing}, claiming that it is outside their threat +Of the major private browsing modes, Google Chrome's Incognito Mode comes the +closest to conveying this idea of ``identity'' to the user, and its +implementation is also simple as a result. The Incognito Mode window is a +separate, stylized window which clearly conveys that an alternate identity +is in use +in this window, which can be used concurrently with the non-private identity. +The better UI appears to lead to less mode error (in which the user forgets +whether +private browsing is enabled) than other browsers' private browsing +modes~\cite{private-browsing}. + +The Mozilla Weave project~\cite{weave-manager} appears to be proposing an +identity-oriented method of managing, syncing, and storing authentication +tokens, and also has use cases described for multiple users of a single +browser. It is the closest idea on paper to what we envision as the way to +bridge user assumptions with reality. + +Unfortunately, all current private browsing modes protect only against +adversaries with access to the local computer and fail to deal with +linkability against a network adversary (such as advertising +networks)~\cite{private-browsing}, claiming that it is outside their threat model\footnotemark. If the user is given a new identity that is still linkable -to the previous one due to shortcomings of the browser, this approach has +to the previous one due to shortcomings of the browser, the approach has failed as a privacy measure. -% XXXX Define network-level adversary. -% I agree. You could do that here or in the footnote. -RD
-\footnotetext{The primary reason given to abstain from addressing a -network-level -adversary is IP-address linkability. However, we believe this to be a red +\footnotetext{The primary reason given to abstain from addressing a network +adversary is IP-address linkability. However, we believe this argument to be a red herring. Users are quite capable of using alternate Internet connections, and it is common practice for ISPs (especially cellular IP networks) -%in many parts of the world -to rotate user IP -addresses daily, to discourage users from operating servers and to impede the -spread of malware.} +to rotate user IP addresses daily, to discourage users from operating servers +and to impede the spread of malware.}
-Linkability solutions within the identity framework would be similar to the -origin model solutions, except they would be properties of the entire browser -or browser profile, and would be obfuscated only once per identity switch. +Therefore, in addition to isolating explicit disk-based identifiers and +browser state, an attempt should be made to obfuscate or alter the biggest +culprits in terms of the entropy linkability metric mentioned in Section~2.4.
+However, not all linkability sources have viable solutions under an +identity-isolation approach, and moreover, identity-isolation approaches fail +to protect the user against linkability due to ubiquitous third party content +elements that track them across nearly all sites as soon as they log into any +one site~\cite{facebook-like}. + +\subsection{Improving the Origin Model} + +The other primary source of disconnect between user expectations and reality +on the web is the origin model that governs cookie and other identifier +transmission. The model allows unique, globally linkable identifiers to be +transmitted for arbitrary content elements on any page, and they can be +sourced from anywhere without user interaction or awareness. This property +enables popular advertising and content distribution networks to have +near-omniscient visibility into all user activity retroactively after any +level of authentication takes place with a cooperating partner site. + +This identifier transmission model is fundamentally flawed when viewed from +the perspective of meeting the expectations of the user. + +So far, industry has resisted changes to the identifier transmission model due +to both inertia and compatibility concerns. However, the disconnect is so +severe and the associated tracking is so pervasive that some level of +temporary breakage must be tolerated to improve the status quo. Because of the +retroactive nature of the linkability of cookies and other identifier storage, +and because of the invisible and pervasive nature of these partnerships, +privacy-by-design is essentially impossible to provide to the average user +without addressing this issue. + +However, the behavior of identifiers and linkable attributes can be improved +to make linkability less implicit and more consent-driven without the need for +cumbersome interventionist user interface, and with minimal damage to existing +content. Where explicit identifiers exist, they should be tied to the pair of +the top-level origin and the third-party content origin. Where linkability +attributes exist, they can be obfuscated on a per-origin basis. + +The work done by the Stanford Applied Crypto Group shows that it is relatively +straight-forward to isolate the browser cache to specific top-level origins, +effectively binding identifiers hidden in cached elements to the pair of +top-level and third-party origin~\cite{safecache}. Commonly sourced +third-party content elements are then fetched and cached repeatedly, but this +is necessary to prevent linkability: each of these content elements can be +crafted to include an identifier unique to each user, thus tracking even users +who attempt to avoid tracking by clearing normal cookies. + +The Stanford group correctly observed that the problem with origin model +improvements is that individually, they do not fully address the linkability +problem unless the same restriction is applied uniformly to all aspects of +stored browser state, and all other linkability issues are dealt with. +Behind-the-scenes partnerships can easily allow companies to continue to link +users to their identities through any linkable aspect of browser state that is +not properly compartmentalized to the top level origin and bound to the same +rules as all other linkable state. + +Along these lines, the Mozilla development wiki describes an origin model +improvement for cookie transmission written by Dan Witte~\cite{thirdparty}. He +describes applying this same dual-keyed origin to cookies, so that cookies +would only be transmitted if they match both the top-level origin and the +third-party origin involved in their creation. Dan observed minimal breakage +to popular sites, and where breakage did occur, alternative approaches that +did not violate the new model were readily available to web designers and +often already in use. + +Similarly, one could imagine this two-level dual-keyed origin isolation being +deployed to improve similar issues with DOM Storage and cryptographic tokens. +This dual-origin policy should be considered a must for all future +origin-bound identifiers. + +With a clear association between third-party cookies and their top-level +origin due to double-keying, it becomes easier to provide the user with more +intuitive control over site identifiers, and thus with more control over their +actual relationship to particular sites. For example, the privacy settings +window could have a user-intuitive way of representing the user's relationship +with different origins, perhaps by using only the `favicon' of that top level +origin to represent all of the browser state accumulated by that origin. The +user could delete the entire set of browser state (cookies, cache, storage, +cryptographic tokens) associated with a site simply by removing its favicon +from her privacy info panel. + +Linkability based on fingerprintable browser properties is also amenable to +improvement under this model. In particular, one can imagine per-origin plugin +loading permissions, per-origin limits on the number of fonts that can be +used, and randomized window-specific time offsets. + +While these approaches are in fact useful for bringing the technical realities +of the web closer to what the user assumes is happening, they must be deployed +uniformly, with a consistent top-level origin restriction model. Uniform +deployment will take significant coordination and standardization efforts. +Furthermore, even an vastly improved origin model still cannot prevent +instances of explicit tracking partnerships between sites and third-party +content providers. Therefore, both origin improvements as well as +identity-isolation approaches are necessary. + \section{Conclusions}
The appeal of the web's prevailing revenue model and the difficulties @@ -395,17 +395,13 @@ In Section~6, it also strongly suggested that informed consent and user control should govern the interaction of users to tracking identifiers.
-Without changes to browser behavior, browser interface, or both, such informed -consent is simply not possible on today's web. Several examples from academia -and practice show that it is possible to bridge this disconnect by addressing -the linkability issues with the web's origin model with minimal breakage. -Additionally, the first steps towards providing the user with an explicit -representation of their web identity have been taken. -% Taken by who? Change to active tense. -RD +Without changes to both browser behavior and browser interface, such informed +consent is simply not possible on today's web. The lack of informed consent +makes it impossible to expect privacy-by-design approaches to function +properly. Users who do not even understand the basic properties of the +tracking mechanisms they are subjected to cannot be expected to effectively +use privacy mechanisms to avoid, opt out of, or decline such tracking.
-The pieces are in place to build robust private browsing modes based on these -two approaches, and metrics exist to measure their success. - \bibliographystyle{plain} \bibliography{W3CIdentity}
%\clearpage
tor-commits@lists.torproject.org