[tor-bugs] #22233 [Core Tor/Tor]: Reconsider behavior on .z URLs with Accept-Encoding header
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Oct 18 01:46:58 UTC 2018
#22233: Reconsider behavior on .z URLs with Accept-Encoding header
-------------------------------------------------+-------------------------
Reporter: nickm | Owner: ahf
Type: defect | Status:
| assigned
Priority: Medium | Milestone: Tor:
| unspecified
Component: Core Tor/Tor | Version:
Severity: Normal | Resolution:
Keywords: 034-triage-20180328, | Actual Points:
034-removed-20180328 |
Parent ID: | Points:
Reviewer: | Sponsor:
| Sponsor4
-------------------------------------------------+-------------------------
Comment (by Hello71):
Replying to [comment:4 yawning]:
> Replying to [comment:3 arma]:
> > FYI, my wget didn't send any accept-encoding header. Neither did
Sebastian's. Maybe Yawning's did? You can tell it to *add* an accept-
encoding header, but then what do you expect.
>
> `wget http://example.com` on my system does this:
>
> {{{
> GET / HTTP/1.1
> User-Agent: Wget/1.19.1 (linux-gnu)
> Accept: */*
> Accept-Encoding: identity
> Host: example.com
> Connection: Keep-Alive
> }}}
>
> Python's HTTP client also includes the header with `identity`.
>
> > I think the issue here is more that there are two ways to indicate you
want compression -- adding a .z to the url, and saying so in the accept-
encoding header -- and we should build the two by two decision matrix and
do the smart thing for all four cases.
>
> Yes. The existing code tries to treat `.z` as `Accept-Encoding:
deflate`, which is a shortcut, and not always correct. Assuming we do not
want to double compress, what I would consider working behavior looks
like:
>
> || File || Accept-Encoding || Action
||
> || `foo` || N/A || `foo`
||
> || `foo` || `identity` || `Content-Encoding: identity`,
`foo` ||
> || `foo` || `deflate` || `Content-Encoding: deflate`,
`deflate(foo)` ||
> || `foo` || `identity, deflate` || `Content-Encoding: deflate`,
`deflate(foo)` ||
> || `foo` || `identity, gzip` || `Content-Encoding: gzip`,
`gzip(foo)` ||
> || `foo` || `gzip` || `Content-Encoding: gzip`,
`gzip(foo)` ||
> || `foo` || `deflate, gzip` || `Content-Encoding: gzip`,
`gzip(foo)` ||
> || `foo.z` || N/A || `deflate(foo)`
||
> || `foo.z` || `identity` || `Content-Encoding: identity`,
`deflate(foo)` ||
> || `foo.z` || `deflate` || `406 Not Acceptable`
||
> || `foo.z` || `identity, deflate` || `Content-Encoding: identity`,
`deflate(foo)` ||
> || `foo.z` || `identity, gzip` || `Content-Encoding: identity`,
`deflate(foo)` ||
> || `foo.z` || `gzip` || `406 Not Acceptable`
||
> || `foo.z` || `deflate, gzip` || `406 Not Acceptable`
||
>
> (`gzip` used as a placeholder algorithm for "Something that is supported
that is not `deflate`)
>
> The current code mishandles the cases in the table that should either
double compress or return `406`.
I believe this is not consistent with modern HTTP and web client behavior.
I am fairly sure that modern web clients do one of the following:
1. send Accept-Encoding: deflate, gzip (or gzip, deflate)
2. if the response is Content-Encoding: deflate or gzip, transparently
decompress it.
3. process the decompressed content as the type indicated in Content-Type.
1. do not send Accept-Encoding, or send Accept-Encoding: identity
2. do not decompress the content
3. process the content as the type indicated in Content-Type.
Note that not sending any Accept-Encoding is identical to sending Accept-
Encoding: identity, as specified in RFC 7231
(https://tools.ietf.org/html/rfc7231#section-5.3.4).
I am fairly sure that this behavior also does not depend on the file
extension of the URL. Therefore, it is not correct to return 406 if the
server thinks that compressing the content is stupid (note that this is
not just the case for gzipped files. it also applies to image files, video
files, font files, and so on; too many for the browser to even attempt to
make a comprehensive list of file extensions). Instead, it should simply
not compress the content, not send Content-Encoding: identity, and send it
as is. You can see this behavior if you execute for example `curl
--compressed -v torproject.org`. Compression is offered, but the server
doesn't want to bother, so it just doesn't compress it. This is supported
by https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-
Encoding, which says "As long as the identity value, meaning no encoding,
is not explicitly forbidden, by an identity;q=0 or a *;q=0 without another
explicitly set value for identity, the server must never send back a 406
Not Acceptable error.".
Therefore, I think your table should look more like this:
> || File || Accept-Encoding || Action
||
> || `foo` || none or `identity` || no Content-Encoding, `foo`
||
> || `foo` || `deflate` || `Content-Encoding: deflate`,
`deflate(foo)` ||
> || `foo` || `gzip` || `Content-Encoding: gzip`,
`gzip(foo)` ||
> || `foo` || `deflate, gzip` || `Content-Encoding: deflate` or
`gzip`, `deflate(foo)` or `gzip(foo)` respectively ||
> || `foo.z` || none or `identity` || no Content-Encoding,
`deflate(foo)` ||
> || `foo.z` || `deflate` || no Content-Encoding,
`deflate(foo)` ||
> || `foo.z` || `gzip` || no Content-Encoding,
`deflate(foo)` ||
> || `foo.z` || `deflate, gzip` || no Content-Encoding,
`deflate(foo)` ||
I doubt there exist any actual modern web clients than do not fit one of
these. If there are, it's probably fine to send them whatever as long as
they accept it, explicitly or implicitly.
Note that this guarantees that anybody who requests `foo` will see the
actual contents of `foo` in their browser, or saved to their disk or
whatever. Additionally, anybody who requests `foo.z` will always receive a
deflated version of `foo`, and (theoretically) will not have their browser
decompress it behind their backs. Also, we do not unnecessarily compress
anything twice.
For what it's worth, my wget also sends `Accept-Encoding: identity` by
default. I'm using wget 1.19.5.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22233#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list