[tor-dev] Proposal: Optimistic Data for Tor: Client Side

Sun Jun 5 13:29:55 UTC 2011

Ian Goldberg <iang at cs.uwaterloo.ca> wrote:

> On Sat, Jun 04, 2011 at 08:42:33PM +0200, Fabian Keil wrote:
> > Ian Goldberg <iang at cs.uwaterloo.ca> wrote:

> > > Overview:
> > > 
> > > This proposal (as well as its already-implemented sibling concerning the
> > > server side) aims to reduce the latency of HTTP requests in particular
> > > by allowing:
> > > 1. SOCKS clients to optimistically send data before they are notified
> > >     that the SOCKS connection has completed successfully
> > 
> > So it should mainly reduce the latence of HTTP requests
> > that need a completely new circuit, right?
> 
> No, just ones that need a new TCP stream.  The optimistic data stuff is
> about quickly sending data in just-being-constructed streams within an
> already-constructed circuit.

I see.

> > Do you have a rough estimate of what percentage of requests would
> > actually be affected? I mean, how may HTTP requests that need a new
> > circuit are there usually compared to requests that can reuse an
> > already existing one (or even reuse the whole connection)?
> 
> Assuming you mean "stream" instead of "circuit" here, then, as above, I
> think most HTTP connections would be in this category.  It might be
> interesting to examine some HTTP traces to see, though.  <shoutout
> target="Kevin">Kevin, you were looking at some HTTP traces for other
> reasons, right?  Anything in there that may help answer this
> question?</shoutout>

I actually meant "how many HTTP requests that need a new circuit
are there usually compared to requests that only need an new stream
(or reuse the whole connection)?"

You've already written above that HTTP request that need a completely
new circuit aren't affected anyway, so that leaves the requests that
need a new stream and those that don't and I can get a rough idea
about those myself by looking at my logs (your mileage is likely to
vary of course):

fk at r500 ~ $privoxy-log-parser.pl --statistics /usr/jails/privoxy-jail/var/log/privoxy/privoxy.log.*
Client requests total: 430598
[...]
Outgoing requests: 300971 (69.90%)
Server keep-alive offers: 156193 (36.27%)
New outgoing connections: 237488 (55.15%)
Reused connections: 63483 (14.74%; server offers accepted: 40.64%)
Empty responses: 5244 (1.22%)
Empty responses on new connections: 430 (0.10%)
Empty responses on reused connections: 4814 (1.12%)
[...]

> > I'm aware that this depends on various factors, but I think even
> > having an estimate that is only valid for a certain SOCKS client
> > visiting a certain site would be useful.
> 
> I think overall across sites would be a better number, no?

Sure.

> > How much data is the SOCKS client allowed to send optimistically?
> > I'm assuming there is a limit of how much data Tor will accept?
> 
> One stream window.
> 
> > And if there is a limit, it would be useful to know if optimistically
> > sending data is really worth it in situations where the HTTP request
> > can't be optimistically sent as a whole.
> 
> I suspect it's rare that an HTTP request doesn't fit in one stream
> window (~250 KB).

I agree, I expected the stream window to be a lot smaller.

> > While cutting down the time-to-first-byte for the HTTP request is always
> > nice, in most situations the time-to-last-byte is more important as the
> > HTTP server is unlikely to respond until the whole HTTP request has been
> > received.
> 
> What?  No, I think you misunderstand.  The time-to-first-byte is the
> time until the first byte of the *response* is received back at the
> client.

Makes sense. Thanks for the clarification.

> > > SOCKS clients (e.g. polipo) will also need to be patched to take
> > > advantage of optimistic data.  The simplest solution would seem to be to
> > > just start sending data immediately after sending the SOCKS CONNECT
> > > command, without waiting for the SOCKS server reply.  When the SOCKS
> > > client starts reading data back from the SOCKS server, it will first
> > > receive the SOCKS server reply, which may indicate success or failure.
> > > If success, it just continues reading the stream as normal.  If failure,
> > > it does whatever it used to do when a SOCKS connection failed.
> > 
> > For a SOCKS client that happens to be a HTTP proxy, it can be easier
> > to limit the support for "SOCKS with optimistic data" to "small"
> > requests instead to support it for all. (At least it would be for
> > Privoxy.)
> > 
> > For small requests it's (simplified):
> > 
> > 1. Read the whole request from the client
> > 2. Connect to SOCKS server/Deal with the response
> > 3. Send the whole request
> > 4. Read the response
> > 
> > As opposed to:
> > 
> > 1. Read as much of the response as necessary to decide
> >    how to handle it (which usually translates to reading
> >    at least all the headers)
> > 2. Connect to SOCKS server/Deal with the response
> > 3. Send as much of the request as already known
> > 4. Read some more of the client request
> > 5. Send some more of the request to the server
> > 6. Repeat steps 4 and 5 until the whole request has been
> >    sent or one of the connections is prematurely disconnected
> > 7. Read the response
> > 
> > Implementing it for the latter case as well would be more work
> > and given that most requests are small enough to be read completely
> > before opening the SOCKS connections, the benefits may not be big
> > enough to justify it.
> 
> A reasonable proxy server (e.g. polipo, I'm pretty sure) streams data
> wherever possible.

Sure, but even polipo can't stream data the client didn't send yet and
in case of requests larger than a few MTU sizes, for example file uploads,
the SOCKS connection is probably established before the whole client
request has been received.

>                    Certainly for responses: I seem to remember that
> privoxy indeed reads the whole response from the HTTP server before
> starting to send it to the web client, which adds a ton of extra delay
> in TTFB.

Privoxy buffers the whole response if it's configured to filter it
(as the decision to modify the first response byte could depend on
the last byte).

If no filters are enabled (this seems to be the case for the
configuration Orbot uses), or no filters apply, the response
data is forwarded to the client as it arrives.

> > I wouldn't be surprised if there's a difference for some browsers, too.
> 
> How so?  The browser sends the HTTP request to the proxy, and reads the
> response.  What different behaviour might it have?  The only one I can
> think of is "pipelining" requests, which some browsers/proxies/servers
> support and others don't.  That is, if you've got 4 files to download
> from the same server, send the 4 HTTP requests on the same TCP stream
> before getting responses from any of them.  In that case, you'll see the
> benefit for the first request in the stream, but not the others, since
> the stream will already be open.

I was thinking about file uploads. Currently it's not necessary for
the client to read the whole file before the SOCKS connection is even
established, but it would be, to optimistically send it (or at least
the first part of it).

Even old browsers support file uploads up to ~2GB, so this would
also be a case where the request might be too large to fit in the
stream window.

While file uploads are certainly rare (and successfully pushing
2GB through Tor might be a challenge), it might be worth thinking
about how to handle them anyway. Letting the Tor client cache the
whole file is probably not the best solution above a certain file
size.

I also thought about another case where it's not obvious to me
what to do: a HTTPS connection made by a browser through a HTTP
proxy.

Currently the browser will not start sending data for the server
until the proxy has signaled that the connection has been established,
which the proxy doesn't know until told so by the SOCKS server.

If the HTTP proxy "optimistically lies" to the client it will
not be able to send a proper error message if the SOCKS connection
actually can't be established. Of course this only matters if
the client does something useful with the error message, and at
least Firefox stopped doing that a while ago.

The impact on SSL connections is probably less significant anyway,
though.

Thanks a lot for the detailed response.

Fabian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20110605/64ea5c75/attachment.pgp>