[tbb-bugs] #20025 [Applications/Tor Browser]: document.characterSet leaks locale when HTML page does not specify its own encoding (was: document.characterSet enables fingerprinting of localization (only with HSTS?))

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Oct 2 15:00:25 UTC 2019


#20025: document.characterSet leaks locale when HTML page does not specify its own
encoding
---------------------------------------+--------------------------
 Reporter:  dcf                        |          Owner:  tbb-team
     Type:  defect                     |         Status:  new
 Priority:  Medium                     |      Milestone:
Component:  Applications/Tor Browser   |        Version:
 Severity:  Normal                     |     Resolution:
 Keywords:  tbb-fingerprinting-locale  |  Actual Points:
Parent ID:                             |         Points:
 Reviewer:                             |        Sponsor:
---------------------------------------+--------------------------
Description changed by dcf:

Old description:

> At comment:18:ticket:10703, xfix reports on another means of discovering
> the browser's fallback character encoding, the
> [https://developer.mozilla.org/en-US/docs/Web/API/document/characterSet
> document.characterSet] property (and possibly its aliases
> document.charset and document.inputEncoding). There is a demo site here:
>   https://hsivonen.com/test/moz/check-charset.htm
> Using tor-browser-linux64-6.5a2_en-US.tar.xz, I get the output
>   `Your fallback charset is: windows-1252`
> But using tor-browser-linux64-6.0.4_ko.tar.xz, I get the output
>   `Your fallback charset is: EUC-KR`
>
> This is a separate issue from #10703. I'll leave a comment with a demo
> page that shows both techniques, with the one in #10703 giving the same
> result and document.characterSet giving different results.
>
> The really strange thing is that this only seems to be effective when the
> server has HSTS (a valid `Strict-Transport-Security` header). I couldn't
> reproduce the result of the hsivonen.com demo site with a local web
> server, nor with an onion service, even when copying the demo and its
> header exactly. Only when I put it on an HTTPS server with HSTS could I
> reproduce it. I'll leave a comment with two demo pages allowing you to
> compare.

New description:

 At comment:18:ticket:10703, xfix reports on another means of discovering
 the browser's fallback character encoding, the
 [https://developer.mozilla.org/en-US/docs/Web/API/document/characterSet
 document.characterSet] property (and possibly its aliases document.charset
 and document.inputEncoding). There is a demo site here:
   https://hsivonen.com/test/moz/check-charset.htm
 Using tor-browser-linux64-6.5a2_en-US.tar.xz, I get the output
   `Your fallback charset is: windows-1252`
 But using tor-browser-linux64-6.0.4_ko.tar.xz, I get the output
   `Your fallback charset is: EUC-KR`

 This is a separate issue from #10703. I'll leave a comment with a demo
 page that shows both techniques, with the one in #10703 giving the same
 result and document.characterSet giving different results.

 ~~The really strange thing is that this only seems to be effective when
 the server has HSTS (a valid `Strict-Transport-Security` header). I
 couldn't reproduce the result of the hsivonen.com demo site with a local
 web server, nor with an onion service, even when copying the demo and its
 header exactly. Only when I put it on an HTTPS server with HSTS could I
 reproduce it. I'll leave a comment with two demo pages allowing you to
 compare.~~

 Edit 2019-10-02: Ignore the above paragraph about HSTS. The difference is
 actually due to whether the document specifies its own encoding. See
 comment:7.

--

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20025#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tbb-bugs mailing list