[tor-bugs] #32256 [Applications/Tor Browser]: TorBrowser should advertise Onion Networking capability in the User-Agent: string

Thu Nov 7 21:14:46 UTC 2019

#32256: TorBrowser should advertise Onion Networking capability in the User-Agent:
string
--------------------------------------+--------------------------
 Reporter:  alecmuffett               |          Owner:  tbb-team
     Type:  enhancement               |         Status:  new
 Priority:  Medium                    |      Milestone:
Component:  Applications/Tor Browser  |        Version:
 Severity:  Normal                    |     Resolution:
 Keywords:                            |  Actual Points:
Parent ID:                            |         Points:
 Reviewer:                            |        Sponsor:
--------------------------------------+--------------------------

Comment (by alecmuffett):

 Re: https://trac.torproject.org/projects/tor/ticket/32256#comment:9

 Hi George!

 I've thought about this a lot in the past few days, and this is my
 considered response.

 Before I get into it, I am going to standardise my terminology in order to
 make things a bit clearer and more consistent:

 - "client" means TorBrowser in the hands of a user

 - "site" means a large and sprawling (typically: multiple-onion) site,
 such as Facebook, the BBC, NYTimes, or any of several others which exist.

 - "estate" means a distinct "chunk" of a site, probably with one or more
 separate onion addresses; for instance "bbc.com" and "bbc.co.uk" are
 separate estates within the "BBC site", not least because of content
 licensing concerns.  In the cleartext internet, Facebook, Messenger,
 Instagram and WhatsApp are all separate "estates" of "[the] Facebook
 [Site]"

 - "CDN" means a (usually: third-party) estate which serves part of a site,
 e.g. Fastly

 - "server" means one specific machine within a site.

 My particular interest is in sites which often comprise one or more
 estates - typically: CDNs; the BBC site in particular comprises at least 4
 distinct BBC estates (.com, .co.uk, BBCi, and something-i've-forgotten)
 plus two separate third-party global CDN estates.

 I believe that the BBC is by far the most complex onion site on the
 planet; Facebook by comparison has only onionified its core site + its
 CDN; partly for the reasons that I outline below.

 With this glossary established, I'll begin; and I will number my
 paragraphs in case you wish to reference them:

 [1] First: I am glad that we agree that consumption of bandwidth does have
 a cost associated with it. The questions we must determine are:

 - who should bear that cost?

 - how often?

 - under what circumstances?

 - and how great a cost is it?

 [2] I believe that we've established above that it's not economic nor
 environmentally aware for a large site to issue "Onion-Location" headers
 with every response, merely in the hope that 0.05% of users might make use
 of them

 Scary Maths : (1 million Tor users / 2.7 billion FB users ) * 100 =>
 0.037%

 [3] I also believe that the commentary from Privacy International, and
 elsewhere, describes both the imperfections and tragedy-of-the-commons
 issues relevant to treating Tor exit nodes as a geography to be "tracked".

 [4] As such: it's neither economic nor robust for a server in a site to
 somehow "know" that a request has arrived from an onion-capable browser,
 without employing considerable effort; and when attempting to grow Onion
 adoption it's much harder to pitch fiddly, specialist, complex solutions
 which require specialist realtime databases (etc) to function, with the
 ability to deliver that information across several estates and possibly
 into third parties.

 [5] One could posit an architecture where "when the client tries to log
 in" (nb: the BBC do not operate login over their onion site, the NYT do
 not offer POST functionality at all) then the browser offers an Onion-
 Location header, thereby reducing the cost?

 [6] Yes, one could do that, but then you're in the realm of custom
 engineering to support an onion site, which (again) is a barrier to
 adoption.

 [7] Onion sites are simply HTTP/HTTPS via an alternative network stack
 with a different domain name; setting them up shouldn't require custom re-
 engineering of a site's login page, nor any more effort than would
 establishment of "www.bbc.co.nz" or some other top-level domain.

 [8] In other words: such a proposition would be a kludge, and (again) it
 might be a kludge to support as few as 0.05% of users, so would dissuade
 site adoption of onions.

 [9] Ergo: if the server cannot "know" that the request comes from an
 onion-capable client, then the server needs to be "provoked" into action.
 The client must proactively "tell" the server that the client is onion-
 capable, and then that capability should be expressed to the entire site,
 across all estates, to support the best experience.

 [10] Can that onion-capability be expressed to the backend, via out-of-
 band means - eg: setting a flag in a backend Redis instance? Possibly, but
 practically "no", especially where third-party CDNs in other estates are
 involved.

 [11] Therefore: the onion capability will need to be encoded in the
 session; most likely in a cookie. Something like:

 {{{ Cookie: onion=1 }}}

 or probably more reasonably:

 {{{ Cookie: onion_capable=yes }}}

 ...but let's go with the first one because it's shorter; you'll be adding
 at least 7 more bytes to every request to the site, probably much more,
 and for each site the capability will have to be custom-engineered into
 the CMS or webserver.

 [12] Also, because cookies, it's going to be complicated and fiddly
 engineering to express this capability to third-party CMS onion sites.
 Tor is well aware of the problems with cookies sharing data across sites,
 so I don't think this requires much explanation on my part.

 [13] So would it not be simpler, instead, to simply add those 7-or-more
 bytes to the User-Agent, and have Onion-Capability expressed to everyone?

 [14] The Tor user-agent is currently:

 {{{ Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0 }}}

 [15] I aver that it would be fine to add 6 bytes - "Tor/1" + space:

 {{{ Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0
 Tor/1 }}}

 ...which would be cheaper than the minimum-7-byte Cookie solution for
 talking to the onion-capable sites; this data would also be sent to non-
 onion sites, and make it easy for those sites to address onion-client
 users equitably, as outlined in my first post.

 [16] Aside from anything else, adding "Tor/1" (or whatever) to the User-
 Agent would also greatly assist greatly with the matter of third-party
 CDNs, and also with Alt-Svc.

 [17] There seems to be an assumption that Onion-Location headers are
 necessary in order to provide an optional flow away from a site, to the
 corresponding onion address; and that Onion-Location headers would be
 issued only in "special circumstances" because otherwise they would be
 being sent with every response, leading (again) to the 584-terabits-per-
 day problem, described in comment:6.

 [18] Ergo: to implement "Onion-Location" properly, requires the site to:

 a) know when a client is onion-capable, and

 b) track that it has issued the "Onion-Location" header, and not to do it
 again.

 [19] This is achievable but suffers from the same cross-estate state-
 sharing challenges that are described above. In short: Onion-Location is
 weird, fiddly, and not robust.

 [20] If I am gazing into a crystal ball to speculate [NOTE: THIS IS
 SPECULATION] about the future, it could include:

 - We put "Tor/1" into the User-Agent

 - When a session cookie is dropped for the first time on a server
 connecting to {{{bbc.com}}}, the server also issues:

 {{{ Alt-Svc: someverylongbbcversion3address.onion }}}

 - ...which mechanism (no session cookie? have these!) has the benefit of
 not requiring state to be maintained or propagated around the site.

 - even if the BBC does not adopt the above, CDN providers like Fastly
 could certainly adopt such; and also this header would benefit
 Cloudflare's extant "Opportunistic Onions"

 - also: CDNs are typically better geared-up to deal with {{{ User-Agent
 }}}, than they are with {{{X-Onion-Capable: 1}}}, so integration should be
 simpler; the upstream should be able to pass onion-specific headers
 through the CDN to onion-capable clients by means of the {{{Vary:}}}
 header.

 - equally, anyone connecting to {{{ www.bbc.com }}} with "Tor/1" might
 simply be given a {{{ Location: https://www.bbcnewsv2vjtpsuy.onion/ }}}
 and redirected there, because why not when one has been given notice of
 onion capability?

 - speculation: also, in preparation for the deprecation of v2 addressing,
 perhaps in future anyone who accesses "facebookcorewwwi" might be given a
 "Location:" of {{{www.facebook.com}}} because of Facebook's newly-robust
 and "User-Agent"-powered Alt-Svc capabilities.

 [21] In conclusion, I have to ask: what are User-Agents for, if not for
 this? Why is that word Gecko there, other than to express standard
 rendering capabilities to servers? Why not express communications
 capabilities in the same manner, or why invent a new means?

 [22] You say: "There is no need to send that information later on for
 every sub-resource"; I propose that "need" is not the question; the
 questionis what innovation could be unlocked by making that information
 available to every resource?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32256#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online