The gritty details of sendmes and topics

Roger Dingledine arma at
Wed Jan 8 13:25:53 UTC 2003

I've overhauled the code. I'll tell you more about that once I'm actually
done and I've checked it in. But first, we need to decide on a design
for the sendme cells that do flow control. Marc, it would be great to
hear about your experience on this issue.

The current situation before I added topics: There is one circuit per
request (eg per GET request), and it stretches from the AP (application
proxy) to the Exit node. It has a forward ACI and a backward ACI at
each hop, to let each node tell which circuit a given cell is for. When
the circuit is established, each hop has a receive window initialized
to 1000 cells in each direction. When an outgoing data cell arrives
at a node, it decrements its forward receive window and passes on the
cell. If a data cell arrives when the forward receive window is 0, it
tears down the circuit. (This tear-down propagates to either end of the
circuit via destroy cells.) When an edge node has a receive window of 0,
it stops reading from the edge socket (eg the webserver). When a data
cell has traversed the network, the edge node buffers it on its outbuf,
and evaluates whether to respond with a sendme cell: if its outbuf is
not too full, and its receive window is less than 900, then it queues
a sendme cell back into the circuit. Each node that receives the sendme
increments its window by 100 and passes the cell onward. It also considers
sending a sendme when the outbuf finishes flushing, to prevent deadlock
if all the data cells arrived only when the outbuf was too full.

The new situation with topics: Now there is only one or a handful of
circuits coming out of each AP. New user requests grab onto an existing
circuit, generate a topic_id (random 3 bytes) for their request, and
then throw into the circuit a data cell describing their topic_id,
the topic command 'begin', and the hostname and port they want to go
to. It behaves like an ordinary data cell (indeed, non-edge nodes can't
tell the difference) until it reaches the exit node, at which point he
reads the topic id and command, opens a new connection, and sends back
a data cell, same topic id, topic command 'connected'. When either edge
wants to close just this topic, they send a data cell for that topic id,
with topic command 'end'.

Problem #1: Since begin, end, and connected topic commands are sent
inside data cells, then they use up a bit of the window. Imagine the
circuit has a receive window of 0 at the exit side, and it's stopped
reading from any of the webservers. Then one of the webservers hangs
up. The exit node sends an 'end' data cell. One of the nodes in the
middle of the circuit sees a data cell when the receive window is 0,
freaks out, and kills the whole circuit.

Solution #1: No problem, I'll queue data cells right before they enter
the circuit. If the receive window is >0, it sends the immediately and
decrements the window. If the window hits 0, it tells all topics to quit
reading from the webserver until further notice. When a sendme arrives
I'll dump the queue of pending cells onto the circuit, and if there's
any window left I'll notify all the topics to start reading again.

Problem #2: But we're still doing flow-control on a per-circuit level,
and maybe we need a per-topic level. Imagine the user has two topics
open: an ssh connection and a wget of a kernel tarball. Let's further
imagine wget is broken, such that it reads half of the file and then for
some reason forgets to keep reading. So the wget proceeds as normal,
and sendmes work great, until the wget wedges. Then data continues to
stream from the webserver. If the only topic were the wget, then the
windows would run out and the exit node would stop reading from the
webserver. But whenever a data cell arrives for the ssh topic, it finds
the outbuf empty, sends back a sendme, and immediately the wget topic
gets another 100 cells dumped on it. This repeats and the wget outbuf
at the AP grows larger and larger. Or perhaps worse, the wget topic eats
the whole window, so that when the ssh server wants to send a cell five
minutes later, it finds the window at the exit to be 0, and there's no
hope of ever getting a sendme.

Solution #2: No problem, I'll take the separate sendme cell type out,
and I'll make a topic command called sendme. Then the flow control is
done on a per-topic level, and we're all set. Indeed, now I don't have
to do any inefficient "tell them all to stop reading, tell them all to
start reading" things. (Problem #2a: what if each side empties its window
all at once, and the cells work their way down the circuit, cross, and
end up on the other side. Then neither side has any window left to send
a sendme! Solution #2b: No problem, you always leave yourself a window
of 1, not 0, in case you need to send a sendme later.)

Problem #3: But wait, now the nodes in the middle of the circuit can't
tell that they're supposed to increment their window. This is no good
at all.

Solution #3: No problem, I'll go back to the
sendme-is-a-separate-cell-type idea, but this time I'll stick the
topic_id in the payload. The ACI's at each hop of the circuit will make
sure it gets to the other side of the circuit. (For added complication,
I could crypt the payload just like with data cells, and then peel
off a layer from the topic_id at each step. Or I could accept that
correlating topic_id along the circuit is no easier than simply counting
packets/timing, and leave topic_id uncrypted.)

Problem #4: But now we're relying on each topic to refill a communal
circuit-wide window. Imagine you have 50 topics, and each of them
delivers 20 cells. None of the individual topics has gotten enough cells
to trigger a sendme, yet the window of 1000 is all gone. Deadlock.

Ok, time to get some sleep. Maybe I'll have an answer, or at least a
few more iterations of solution/problem, waiting for me when I wake up. :)


More information about the tor-dev mailing list