[tor-dev] Proposal: Optimistic Data for Tor: Client Side

Sat Jun 4 20:06:08 UTC 2011

On Sat, Jun 04, 2011 at 08:42:33PM +0200, Fabian Keil wrote:
> Ian Goldberg <iang at cs.uwaterloo.ca> wrote:
> 
> > Anyway, here's the client-side sibling proposal to the
> > already-implemented 174.  It cuts down time-to-first-byte for HTTP
> > requests by 25 to 50 percent, so long as your SOCKS client (e.g.
> > webfetch, polipo, etc.) is patched to support it.  (With that kind of
> > speedup, I think it's worth it.)
> 
> Me too, although 25 to 50 percent seem to be more of best case
> scenario and for some requests it's unlikely to make a difference.

The only requests for which it wouldn't make a difference are I think
ones that can reuse an existing stream (that is, you've visited that
same website recently enough that your browser is reusing an open TCP
connection to the HTTP server, but not so recently (e.g. at the same
time) that it's opening parallel connections).  So I think most of the
time, you'll see the benefit.

> > Filename: xxx-optimistic-data-client.txt
> > Title: Optimistic Data for Tor: Client Side
> > Author: Ian Goldberg
> > Created: 2-Jun-2011
> > Status: Open
> > 
> > Overview:
> > 
> > This proposal (as well as its already-implemented sibling concerning the
> > server side) aims to reduce the latency of HTTP requests in particular
> > by allowing:
> > 1. SOCKS clients to optimistically send data before they are notified
> >     that the SOCKS connection has completed successfully
> 
> So it should mainly reduce the latence of HTTP requests
> that need a completely new circuit, right?

No, just ones that need a new TCP stream.  The optimistic data stuff is
about quickly sending data in just-being-constructed streams within an
already-constructed circuit.

> Do you have a rough estimate of what percentage of requests would
> actually be affected? I mean, how may HTTP requests that need a new
> circuit are there usually compared to requests that can reuse an
> already existing one (or even reuse the whole connection)?

Assuming you mean "stream" instead of "circuit" here, then, as above, I
think most HTTP connections would be in this category.  It might be
interesting to examine some HTTP traces to see, though.  <shoutout
target="Kevin">Kevin, you were looking at some HTTP traces for other
reasons, right?  Anything in there that may help answer this
question?</shoutout>

> I'm aware that this depends on various factors, but I think even
> having an estimate that is only valid for a certain SOCKS client
> visiting a certain site would be useful.

I think overall across sites would be a better number, no?

> Did you also measure the differences between requests that need
> a new circuit and requests that only need a new connection from
> the exit node to the destination server?

If there's a new connection from the exit node to the destination
server, then there's a new stream, and you would see the full benefit of
this proposal.

> > This change will save one OP<->Exit round trip (down to one from two).
> > There are still two SOCKS Client<->OP round trips (negligible time) and
> > two Exit<->Server round trips.  Depending on the ratio of the
> > Exit<->Server (Internet) RTT to the OP<->Exit (Tor) RTT, this will
> > decrease the latency by 25 to 50 percent.  Experiments validate these
> > predictions. [Goldberg, PETS 2010 rump session; see
> > https://thunk.cs.uwaterloo.ca/optimistic-data-pets2010-rump.pdf ]
> 
> Can you describe the experiment some more?

A webfetch client, using a single circuit, downloaded a web page from a
fixed server (to eliminate variance due to different server RTTs and
performance) 950 times.  Each time, it randomly decided whether to use
optimistic data (the "SOCKS 4b" line) or not (the "SOCKS 4a" line).
The time from the start of the request (webfetch making the SOCKS
connection) to the time the first data byte of the HTTP response
("HTTP/1.1 200 OK") arrived at webfetch was recorded, and the two CDFs
of those values were plotted.

> I'm a bit puzzled by your "Results" graph. How many requests does
> it actually represent and what kind of request were used?

As above, approximately 475 HTTP GET requests of each type.  Note that
the size of the fetched page is irrelevant to this measurement.

> How much data is the SOCKS client allowed to send optimistically?
> I'm assuming there is a limit of how much data Tor will accept?

One stream window.

> And if there is a limit, it would be useful to know if optimistically
> sending data is really worth it in situations where the HTTP request
> can't be optimistically sent as a whole.

I suspect it's rare that an HTTP request doesn't fit in one stream
window (~250 KB).

> While cutting down the time-to-first-byte for the HTTP request is always
> nice, in most situations the time-to-last-byte is more important as the
> HTTP server is unlikely to respond until the whole HTTP request has been
> received.

What?  No, I think you misunderstand.  The time-to-first-byte is the
time until the first byte of the *response* is received back at the
client.  That's when the user's screen will start changing; previous
work (does someone have a cite handy?) has indicated that if a page
takes too long to start to change, users get frustrated with the
slowness.

> > SOCKS clients (e.g. polipo) will also need to be patched to take
> > advantage of optimistic data.  The simplest solution would seem to be to
> > just start sending data immediately after sending the SOCKS CONNECT
> > command, without waiting for the SOCKS server reply.  When the SOCKS
> > client starts reading data back from the SOCKS server, it will first
> > receive the SOCKS server reply, which may indicate success or failure.
> > If success, it just continues reading the stream as normal.  If failure,
> > it does whatever it used to do when a SOCKS connection failed.
> 
> For a SOCKS client that happens to be a HTTP proxy, it can be easier
> to limit the support for "SOCKS with optimistic data" to "small"
> requests instead to support it for all. (At least it would be for
> Privoxy.)
> 
> For small requests it's (simplified):
> 
> 1. Read the whole request from the client
> 2. Connect to SOCKS server/Deal with the response
> 3. Send the whole request
> 4. Read the response
> 
> As opposed to:
> 
> 1. Read as much of the response as necessary to decide
>    how to handle it (which usually translates to reading
>    at least all the headers)
> 2. Connect to SOCKS server/Deal with the response
> 3. Send as much of the request as already known
> 4. Read some more of the client request
> 5. Send some more of the request to the server
> 6. Repeat steps 4 and 5 until the whole request has been
>    sent or one of the connections is prematurely disconnected
> 7. Read the response
> 
> Implementing it for the latter case as well would be more work
> and given that most requests are small enough to be read completely
> before opening the SOCKS connections, the benefits may not be big
> enough to justify it.

A reasonable proxy server (e.g. polipo, I'm pretty sure) streams data
wherever possible.  Certainly for responses: I seem to remember that
privoxy indeed reads the whole response from the HTTP server before
starting to send it to the web client, which adds a ton of extra delay
in TTFB.  I'm pretty sure polipo doesn't do that, but just streams the
data as it arrives.  Each program is likely to do the corresponding
thing with the requests.

> I wouldn't be surprised if there's a difference for some browsers, too.

How so?  The browser sends the HTTP request to the proxy, and reads the
response.  What different behaviour might it have?  The only one I can
think of is "pipelining" requests, which some browsers/proxies/servers
support and others don't.  That is, if you've got 4 files to download
from the same server, send the 4 HTTP requests on the same TCP stream
before getting responses from any of them.  In that case, you'll see the
benefit for the first request in the stream, but not the others, since
the stream will already be open.

> And even if there isn't, it may still be useful to only implement
> it for some requests to reduce the memory footprint of the local
> Tor process.

Roger/Nick: is there indeed a limit to how much data from the SOCKS
client the OP will queue up today before the stream is open?

> > Security implications:
> > 
> > ORs (for sure the Exit, and possibly others, by watching the
> > pattern of packets), as well as possibly end servers, will be able to
> > tell that a particular client is using optimistic data.  This of course
> > has the potential to fingerprint clients, dividing the anonymity set.
> 
> If some clients only use optimistic data for certain requests
> it would divide the anonymity set some more, so maybe the
> proposal should make a suggestion and maybe Tor should even
> enforce a limit on the client side.

I think the worse case is when most people are using optimistic data,
but some older clients don't.  They'll stand out more readily.

> > Performance and scalability notes:
> > 
> > OPs may queue a little more data, if the SOCKS client pushes it faster
> > than the OP can write it out.  But that's also true today after the
> > SOCKS CONNECT returns success, right?
> 
> It's my impression that there's currently a limit of how much
> data Tor will read and buffer from the SOCKS client. Otherwise
> Tor could end up buffering the whole request, which could be
> rather large.

It would be.  As above, is this actually true, though?

   - Ian