Ian Goldberg iang@cs.uwaterloo.ca wrote:
Anyway, here's the client-side sibling proposal to the already-implemented 174. It cuts down time-to-first-byte for HTTP requests by 25 to 50 percent, so long as your SOCKS client (e.g. webfetch, polipo, etc.) is patched to support it. (With that kind of speedup, I think it's worth it.)
Me too, although 25 to 50 percent seem to be more of best case scenario and for some requests it's unlikely to make a difference.
Filename: xxx-optimistic-data-client.txt Title: Optimistic Data for Tor: Client Side Author: Ian Goldberg Created: 2-Jun-2011 Status: Open
Overview:
This proposal (as well as its already-implemented sibling concerning the server side) aims to reduce the latency of HTTP requests in particular by allowing:
- SOCKS clients to optimistically send data before they are notified that the SOCKS connection has completed successfully
So it should mainly reduce the latence of HTTP requests that need a completely new circuit, right?
Do you have a rough estimate of what percentage of requests would actually be affected? I mean, how may HTTP requests that need a new circuit are there usually compared to requests that can reuse an already existing one (or even reuse the whole connection)?
I'm aware that this depends on various factors, but I think even having an estimate that is only valid for a certain SOCKS client visiting a certain site would be useful.
Did you also measure the differences between requests that need a new circuit and requests that only need a new connection from the exit node to the destination server?
- OPs to optimistically send DATA cells on streams in the CONNECT_WAIT state
- Exit nodes to accept and queue DATA cells while in the EXIT_CONN_STATE_CONNECTING state
This particular proposal deals with #1 and #2.
For more details (in general and for #3), see the sibling proposal 174 (Optimistic Data for Tor: Server Side), which has been implemented in 0.2.3.1-alpha.
Motivation:
This change will save one OP<->Exit round trip (down to one from two). There are still two SOCKS Client<->OP round trips (negligible time) and two Exit<->Server round trips. Depending on the ratio of the Exit<->Server (Internet) RTT to the OP<->Exit (Tor) RTT, this will decrease the latency by 25 to 50 percent. Experiments validate these predictions. [Goldberg, PETS 2010 rump session; see https://thunk.cs.uwaterloo.ca/optimistic-data-pets2010-rump.pdf ]
Can you describe the experiment some more?
I'm a bit puzzled by your "Results" graph. How many requests does it actually represent and what kind of request were used?
Design:
Currently, data arriving on the SOCKS connection to the OP on a stream in AP_CONN_STATE_CONNECT_WAIT is queued, and transmitted when the state transitions to AP_CONN_STATE_OPEN. Instead, when data arrives on the SOCKS connection to the OP on a stream in AP_CONN_STATE_CONNECT_WAIT (connection_edge_process_inbuf):
- Check to see whether optimistic data is allowed at all (see below).
- Check to see whether the exit node for this stream supports optimistic data (according to tor-spec.txt section 6.2, this means that the exit node's version number is at least 0.2.3.1-alpha). If you don't know the exit node's version number (because it's not in your hashtable of fingerprints, for example), assume it does *not* support optimistic data.
- If both are true, transmit the data on the stream.
Also, when a stream transitions *to* AP_CONN_STATE_CONNECT_WAIT (connection_ap_handshake_send_begin), do the above checks, and immediately send any already-queued data if they pass.
How much data is the SOCKS client allowed to send optimistically? I'm assuming there is a limit of how much data Tor will accept?
And if there is a limit, it would be useful to know if optimistically sending data is really worth it in situations where the HTTP request can't be optimistically sent as a whole.
While cutting down the time-to-first-byte for the HTTP request is always nice, in most situations the time-to-last-byte is more important as the HTTP server is unlikely to respond until the whole HTTP request has been received.
SOCKS clients (e.g. polipo) will also need to be patched to take advantage of optimistic data. The simplest solution would seem to be to just start sending data immediately after sending the SOCKS CONNECT command, without waiting for the SOCKS server reply. When the SOCKS client starts reading data back from the SOCKS server, it will first receive the SOCKS server reply, which may indicate success or failure. If success, it just continues reading the stream as normal. If failure, it does whatever it used to do when a SOCKS connection failed.
For a SOCKS client that happens to be a HTTP proxy, it can be easier to limit the support for "SOCKS with optimistic data" to "small" requests instead to support it for all. (At least it would be for Privoxy.)
For small requests it's (simplified):
1. Read the whole request from the client 2. Connect to SOCKS server/Deal with the response 3. Send the whole request 4. Read the response
As opposed to:
1. Read as much of the response as necessary to decide how to handle it (which usually translates to reading at least all the headers) 2. Connect to SOCKS server/Deal with the response 3. Send as much of the request as already known 4. Read some more of the client request 5. Send some more of the request to the server 6. Repeat steps 4 and 5 until the whole request has been sent or one of the connections is prematurely disconnected 7. Read the response
Implementing it for the latter case as well would be more work and given that most requests are small enough to be read completely before opening the SOCKS connections, the benefits may not be big enough to justify it.
I wouldn't be surprised if there's a difference for some browsers, too.
And even if there isn't, it may still be useful to only implement it for some requests to reduce the memory footprint of the local Tor process.
Security implications:
ORs (for sure the Exit, and possibly others, by watching the pattern of packets), as well as possibly end servers, will be able to tell that a particular client is using optimistic data. This of course has the potential to fingerprint clients, dividing the anonymity set.
If some clients only use optimistic data for certain requests it would divide the anonymity set some more, so maybe the proposal should make a suggestion and maybe Tor should even enforce a limit on the client side.
Performance and scalability notes:
OPs may queue a little more data, if the SOCKS client pushes it faster than the OP can write it out. But that's also true today after the SOCKS CONNECT returns success, right?
It's my impression that there's currently a limit of how much data Tor will read and buffer from the SOCKS client. Otherwise Tor could end up buffering the whole request, which could be rather large.
Fabian