The gritty details of sendmes and topics

Marc Rennhard rennhard at tik.ee.ethz.ch
Wed Jan 8 15:43:17 UTC 2003


Hi Roger,

I'll compare your stuff with the way I did it in the Anonymity Network 
(AN). 

> The new situation with topics: Now there is only one or a handful of
> circuits coming out of each AP. New user requests grab onto an existing
> circuit, generate a topic_id (random 3 bytes) for their request, and
> then throw into the circuit a data cell describing their topic_id,
> the topic command 'begin', and the hostname and port they wAnt to go
> to. It behaves like an ordinary data cell (indeed, non-edge nodes can't
> tell the difference) until it reaches the exit node, at which point he
> reads the topic id and command, opens a new connection, and sends back
> a data cell, same topic id, topic command 'connected'. When either edge
> wants to close just this topic, they send a data cell for that topic id,
> with topic command 'end'.

So you have decided to make use of a second layer of multiplexing.

In the AN, flow control does happen "per tunnel" (I name the lower level
anonymous tunnel and the upper level anonymous connections in that 
anonymous tunnel), but hop-by-hop instead end-to-end. I couldn't say
right now which one is better, but it proved to work well in practise.

As I use hop-by-hop flow control, this control information is not
transported within an anonymous tunnel (and therefore only visible to the
endpoints), but on the same level as the anonymous tunnel data. So
between two hops, I have DATA messages that are just forwarded through
the system and CONTROL messages. Both are link-encrypted and cannot
be distingushed from the outside. Each DATA message reduces the window
by one, just like in TOR. But CONTROL messages do not and have always 
precedence to DATA messages. So even if a queue is full with DATA
messages and the sending window is empty, CONTROL messages (and the
message to increment the window is one of them, called CREDIT) simply
bypass them and are forwarded. This guarantees there are never deadlocks
in the sense that both sides have used up their windows and cannot
increment each others windows again.

> Problem #1: Since begin, end, and connected topic commands are sent
> inside data cells, then they use up a bit of the window. Imagine the
> circuit has a receive window of 0 at the exit side, and it's stopped
> reading from any of the webservers. Then one of the webservers hangs
> up. The exit node sends an 'end' data cell. One of the nodes in the
> middle of the circuit sees a data cell when the receive window is 0,
> freaks out, and kills the whole circuit.
> 
> Solution #1: No problem, I'll queue data cells right before they enter
> the circuit. If the receive window is >0, it sends the immediately and
> decrements the window. If the window hits 0, it tells all topics to quit
> reading from the webserver until further notice. When a sendme arrives
> I'll dump the queue of pending cells onto the circuit, and if there's
> any window left I'll notify all the topics to start reading again.

That solves it. It's done exactly in this way in the AN.

> Problem #2: But we're still doing flow-control on a per-circuit level,
> and maybe we need a per-topic level. Imagine the user has two topics
> open: an ssh connection and a wget of a kernel tarball. Let's further
> imagine wget is broken, such that it reads half of the file and then for
> some reason forgets to keep reading. So the wget proceeds as normal,
> and sendmes work great, until the wget wedges. Then data continues to
> stream from the webserver. If the only topic were the wget, then the
> windows would run out and the exit node would stop reading from the
> webserver. But whenever a data cell arrives for the ssh topic, it finds
> the outbuf empty, sends back a sendme, and immediately the wget topic
> gets another 100 cells dumped on it. This repeats and the wget outbuf
> at the AP grows larger and larger. Or perhaps worse, the wget topic eats
> the whole window, so that when the ssh server wants to send a cell five
> minutes later, it finds the window at the exit to be 0, and there's no
> hope of ever getting a sendme.

That's one I never considered. It would block a user in the AN, as not
reading any more from a blocked tunnel simply never triggers CREDIT
messages to be sent back. Solving it without moving to per-topic flow 
control is not possible. But then, looking at your problems 3 and 4 
following below, this itself gives new headaches. I look at it like 
this: If an application at the user's site stops reading, it's his
problem. It won't kill the system (that's why we use flow control) and
the user only harms himself and noone else. If you stop here and accept
this flaw, problems 3 and 4 won't occur. True, it gives users potential
problems they wouldn't have if they weren't using TOR. There is still
the option to use another tunnel for different applications. So use
a tunnel for wget, one for ssh and one (or more) for web browsing. That
should make the probability that 'everything stops' quite small.

Somehow I don't like your combination of end-to-end flow control and
using windows also at the nodes in-between. Do you really need the 
counters at these nodes? That would also solve most of what's below.
I mean, if the endpoints make sure they don't flood the net, why
additional checks in the middle? To prevent DoS attacks by bad nodes?

> Solution #2: No problem, I'll take the separate sendme cell type out,
> and I'll make a topic command called sendme. Then the flow control is
> done on a per-topic level, and we're all set. Indeed, now I don't have
> to do any inefficient "tell them all to stop reading, tell them all to
> start reading" things. (Problem #2a: what if each side empties its window
> all at once, and the cells work their way down the circuit, cross, and
> end up on the other side. Then neither side has any window left to send
> a sendme! Solution #2b: No problem, you always leave yourself a window
> of 1, not 0, in case you need to send a sendme later.)
> 
> Problem #3: But wait, now the nodes in the middle of the circuit can't
> tell that they're supposed to increment their window. This is no good
> at all.
> 
> Solution #3: No problem, I'll go back to the
> sendme-is-a-separate-cell-type idea, but this time I'll stick the
> topic_id in the payload. The ACI's at each hop of the circuit will make
> sure it gets to the other side of the circuit. (For added complication,
> I could crypt the payload just like with data cells, and then peel
> off a layer from the topic_id at each step. Or I could accept that
> correlating topic_id along the circuit is no easier than simply counting
> packets/timing, and leave topic_id uncrypted.)

It may not be fatal that intermediate nodes learn the topic-ids, but it
seems wrong from a design point of view. The intermediate nodes should
care about tunnelsi/circuits, and not about the upper layer of multi-
plexing.

> Problem #4: But now we're relying on each topic to refill a communal
> circuit-wide window. Imagine you have 50 topics, and each of them
> delivers 20 cells. None of the individual topics has gotten enough cells
> to trigger a sendme, yet the window of 1000 is all gone. Deadlock.

You could set the number of topics to an upper limit, say 10 and make the
counters at the intermediate nodes 10 times a big as the counter of each 
embedded topic. That should solve it.

The more I'm thinking about it, the more I don't like per-topic flow
control. It gets much more complicated. I see that per-circuit flow
control has a problem with your wget-bug, but I still consider that one
less significant than the overhead and the problems introduced when
moving to per-topic flow control. The 'one circuit per application
approach' could be an acceptable compromize.

Cheers,
--Marc


More information about the tor-dev mailing list