On 9 Nov 2015, at 20:45, Roger Dingledine <arma@mit.edu> wrote:

On Mon, Nov 09, 2015 at 08:04:55PM +1100, Tim Wilson-Brown - teor wrote:
Subsequent queries get the same IP address for several tens of seconds afterwards.

Woah. Are we setting the Expires: http header in our Tor answer based on
how long we think the *payload* will remain valid, and the proxy in the
middle is caching both the payload and the headers for that time?

That's what most caches do with unknown headers: cache them for the same period as the payload.

HTTP 1.0 is silent on whether headers should be cached - it just says content should be cached.
http://www.w3.org/Protocols/HTTP/1.0/spec.html#Expires

HTTP 1.1 specifies that headers are cached along with content, except for a few headers that don't get cached. (It also specifies that "not-modified" and similar responses replace headers from the original response. But I doubt that's relevant here.)
https://tools.ietf.org/html/rfc7234

For HTTP 1.1 compliant caches, Tor can specifically indicate that X-Your-IP-Address-Is is a hop-by-hop header by using a the "Connection" header like this:
Connection: close X-Your-IP-Address-Is
("close" indicates that Tor doesn't support persistent connections.)
This would cause the X-Your-IP-Address-Is header to not ever be sent out by the cache, not even on the first connection. If Tor sees the IP address of the cache, this is what we want. But if it sees the IP address of the relay, we want the first response to have the X-Your-IP-Address-Is header, and any cached responses to not have it (see below).
https://tools.ietf.org/html/rfc7230#section-6.1

Alternately, we could use:
Cache-Control: no-cache="X-Your-IP-Address-Is"
This is interpreted as a request not to cache the header at all, or, if the no-cache="<header-name>" feature is not supported, it's generally interpreted as a request not to cache the entire document. (This also causes the cache to attempt to revalidate the header, which might not be what we want, as Tor might not support cache revalidation.)
https://tools.ietf.org/html/rfc7234#section-5.2.2

These are HTTP 1.1 features, but Tor provides a HTTP 1.0 response. HTTP 1.1 caches are supposed to handle headers by applying the HTTP 1.1 rules, regardless of the version of the response.

For HTTP 1.0 compliant caches, Tor can disable caching:
Pragma: no-cache
(This will also disable caching for HTTP 1.1 caches unless we provide a more generous Cache-Control header, like the one above.)
https://tools.ietf.org/html/rfc7234#section-5.4

If we disable caching, this will put more load on Faravahar. But at least we'll get correct behaviour.

If we remove the header, some relays might not be able to find their IP address as quickly.
(Do we believe IP addresses from other relays, or just the authorities?)

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP 968F094B

teor at blah dot im
OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F