Filename: 213-remove-stream-sendmes.txt Title: Remove stream-level sendmes from the design Author: Roger Dingledine Created: 4-Nov-2012 Status: Open
1. Motivation
Tor uses circuit-level sendme cells to handle congestion / flow fairness at the circuit level, but it has a second stream-level flow/congestion/fairness layer under that to share a given circuit between multiple streams.
The circuit-level flow control, or something like it, is needed because different users are competing for the same resources. But the stream-level flow control has a different threat model, since all the streams belong to the same user.
When the circuit has only one active stream, the downsides are a) that we waste 2% of our bandwidth sending stream-level sendmes, and b) because of the circuit-level and stream-level window parameters we picked, we end up sending only half the cells we might otherwise send.
When the circuit has two active streams, they each get to send 500 cells for their window, because the circuit window is 1000. We still spend the 2% overhead.
When the circuit has three or more active streams, they're all typically limited by the circuit window, since the stream-level window won't kick in. We still spend the 2% overhead though. And depending on their sending pattern, we could experience cases where a given stream might be able to send more data on the circuit, but it chooses not to because its stream-level window is empty.
More generally, we don't have a good handle on the interactions between all the layers of congestion control in Tor. It would behoove us to simplify in the case where we're not clear on what it buys us.
2. Design
We should strip all aspects of this stream-level flow control from the Tor design and code.
2.1. But doesn't having a lower stream window than circuit window save room for new streams?
It could be that a feature of the stream window is that there's always space in the circuit window for another begin cell, so new streams will open faster than otherwise. But first, if there are two or more active streams going, there won't be any extra space. Second, since begin cells are client-to-exit, and typical circuits don't fill their outbound circuit windows very often anyway, and also since we're hoping to move to a world where we isolate more activities between circuits, I'm not inclined to worry much about losing this maybe-feature.
See also proposal 168, "reduce default circuit window" -- it's interesting to note that proposal 168 was unknowingly dabbling in exactly this question, since reducing the default circuit window to 500 or less made stream windows moot. It might be worth resurrecting the proposal 168 experiments once this proposal is implemented.
2.2. If we dump stream windows, we're effectively doubling them.
Right now the circuit window starts at 1000, and the stream window starts at 500. So if we just rip out stream windows, we'll effectively change the stream window default to 1000, doubling the amount of data in flight and potentially clogging up the network more.
We could either live with that, or we could change the default circuit window to 500 (which is easy to do even in a backward compatible way, since the edge connection can simply choose to not send as many cells).
3. Evaluation
It would be wise to have some plan for making sure we didn't screw up the network too much with this change. The main trouble there is that torperf et al only do one stream at a time, so we really have no good baseline, or measurement tools, to capture network performance for multiple parallel streams.
Maybe we should resolve task 7168 before the transition, so we're more prepared.
4. Transition
Option one is to do a two-phase transition. In the first phase, edges stop enforcing the deliver window (i.e. stop closing circuits when the stream deliver goes negative, but otherwise they send and receive stream-level sendmes as now). In the second phase (once all old versions are gone), we can start disobeying the deliver window, and also stop sending stream-level sendmes back.
That approach takes a while before it will matter. As an optimization, since clients can know which relay versions support the new behavior, we could have relays interpret violating the deliver window as signaling support for removed stream-level sendmes: the relay would then stop sending or expecting sendmes. That optimization is somewhat klunky though, first because web-browsing clients don't generally finish out a stream window in the upstream direction (so the klunky trick will probably never happen by accident), and second because if we lower the circuit window to 500 (see Sec 2.2), there's now no way to violate stream deliver windows.
Option two is to introduce another relay cell type, which the client sends before opening any streams to let the other side know that it shouldn't use or expect stream-level sendmes. A variation here is to extend either the create cell or the begin cell (ha -- and they thought I was crazy when I included the explicit \0 at the end of the current begin cell payload), so we can specify our circuit preferences without any extra overhead.
Option three is to wait until we switch to a new circuit protocol (e.g. when we move to ntor or ace), and use that as the signal to drop stream-level sendmes from the design. And hey, if we're lucky, by then we'll have sorted out the n23 questions (see ticket 4506) and we might be dumping circuit-level sendmes at that point too.
Options two or three seem way better than option one.
And since it's not super-urgent, I suggest we hold off on option two to see if option three makes sense.
On Sun, Nov 04, 2012 at 06:31:51PM -0500, Roger Dingledine wrote:
- Design
We should strip all aspects of this stream-level flow control from the Tor design and code.
See also https://trac.torproject.org/projects/tor/ticket/4485 wherein I point to a git branch that implements this part of the proposal.
--Roger
On Sun, 04 Nov 2012 18:31:51 +0000, Roger Dingledine wrote: ...
The circuit-level flow control, or something like it, is needed because different users are competing for the same resources. But the stream-level flow control has a different threat model, since all the streams belong to the same user.
But still separate flow control is in the expectation of the, erm, this user. When I have to ssh open via tor to the same host, they take the same circuit. And I don't expect the other session to block just because I ^S-ed (stopped the output of) the first session.
...
It could be that a feature of the stream window is that there's always space in the circuit window for another begin cell, so new streams will open faster than otherwise.
Or at all. With no per-stream window a single stalled stream would block the circuit forever. Besides the ssh scenario think 'large put/post request and server hiccuping' - a twitpic post not working out does not make me expect that another browser tab to a differnt host, but same circuit, will block.
But first, if there are two or more active streams going, there won't be any extra space.
When a stream announces window to the other side I expect it to be capable to accept that data somewhere, so even if the stream windows currently overbook the circuit window that data should drain into the local buffers and let the circuit window reopen.
Andreas
On Mon, Nov 05, 2012 at 06:57:48AM +0100, Andreas Krey wrote:
With no per-stream window a single stalled stream would block the circuit forever.
Wait, what?
Can you define 'stalled' here? I think you are misunderstanding the current (and proposed) design.
With no per-stream window, the circuit will round-robin between the streams that want to send a cell, just as it does now. The only difference in the proposed change is that it would stop ignoring streams who have sent their whole stream window but not yet heard a stream-level sendme back. In either case it would continue obeying the circuit sendme windows.
When a stream announces window to the other side I expect it to be capable to accept that data somewhere, so even if the stream windows currently overbook the circuit window that data should drain into the local buffers and let the circuit window reopen.
It does.
--Roger
On Mon, 05 Nov 2012 02:01:21 +0000, Roger Dingledine wrote:
On Mon, Nov 05, 2012 at 06:57:48AM +0100, Andreas Krey wrote:
With no per-stream window a single stalled stream would block the circuit forever.
Wait, what?
Can you define 'stalled' here?
'Receiver of the stream does not read anymore, for whatever reason.'
...
With no per-stream window, the circuit will round-robin between the streams that want to send a cell, just as it does now.
But where does the data go when the end (socks client or the server the exit node is talking to) isn't accepting any more (and tor can't write to the TCP socket)?
The sending side is pushing stream data into the circuit, and the receiving side (tor process) must either collect it locally (thereby growing the memory footprint), or not allow new circuit window, thereby affecting the other streams on the circuit.
Or did I miss another per-stream feedback/flow control mechanism?
Andreas
On Mon, Nov 05, 2012 at 08:31:05AM +0100, Andreas Krey wrote:
'Receiver of the stream does not read anymore, for whatever reason.'
With no per-stream window, the circuit will round-robin between the streams that want to send a cell, just as it does now.
But where does the data go when the end (socks client or the server the exit node is talking to) isn't accepting any more (and tor can't write to the TCP socket)?
The sending side is pushing stream data into the circuit, and the receiving side (tor process) must either collect it locally (thereby growing the memory footprint), or not allow new circuit window, thereby affecting the other streams on the circuit.
Or did I miss another per-stream feedback/flow control mechanism?
Ah ha!
Yes, I think you're right.
Looking at it from the exit relay's perspective (which is where it matters most, since most use of Tor is sending a little bit and receiving a lot): when a create cell shows up to establish a circuit, that circuit is allowed to send back at most 1000 cells. When a begin relay cell shows up to ask that circuit to open a new stream, that stream is allowed to send back at most 500 cells.
Whenever the Tor client has received 100 cells on that circuit, she immediately sends a circuit-level sendme back towards the exit, to let it know to increment its "number of cells it's allowed to send on the circuit" by 100.
However, a stream-level sendme is only sent when both a) the Tor client has received 50 cells on a particular stream, *and* b) the application that initiated the stream is willing to accept more data.
If we ripped out stream-level sendmes, then as you say, we'd have to choose between "queue all the data for the stream, no matter how big it gets" and "tell the whole circuit to shut up".
I believe you have just poked a hole in the n23 ("defenstrator") design as well: http://freehaven.net/anonbib/#pets2011-defenestrator since it lacks any stream-level pushback for streams that are blocking on writes. Nicely done!
--Roger
On Tue, 6 Nov 2012 01:06:56 -0500 Roger Dingledine arma@mit.edu wrote:
If we ripped out stream-level sendmes, then as you say, we'd have to choose between "queue all the data for the stream, no matter how big it gets" and "tell the whole circuit to shut up".
A possible compromise: A stream level XOFF/XON instead of SENDME would allow us to save the flow control bandwidth for properly flowing streams and still have a way to deal with stalled ones.
Julian
On Tue, 06 Nov 2012 11:51:10 +0000, Julian Yon wrote:
On Tue, 6 Nov 2012 01:06:56 -0500 Roger Dingledine arma@mit.edu wrote:
If we ripped out stream-level sendmes, then as you say, we'd have to choose between "queue all the data for the stream, no matter how big it gets" and "tell the whole circuit to shut up".
I had a third one - which isn't quite practical: Just kill that stream. :-)
A possible compromise: A stream level XOFF/XON instead of SENDME would allow us to save the flow control bandwidth for properly flowing streams and still have a way to deal with stalled ones.
In another protocol I tended to piggyback the SENDME equivalent onto the data frames of the other direction, but here there is typically no data flowing in the other direction when you need to SENDME: Big downloads.
Another idea: Put the stream-level SENDMEs into the circuit-level ones. As far as I can see there should be sufficient space to do so, even for several streams at once. At that point, we could also change the SENDME to include a specific number of cells to additionally allow on each level, instead of the fixed 100/50.
Andreas