[tor-dev] Connection, Channel and Scheduler - An Intense Trek

Sun Nov 12 19:56:57 UTC 2017

On Mon, Oct 30, 2017 at 03:57:04PM -0400, David Goulet wrote:
> 2. DESTROY cells handling
>·
> Within a circuitmux object, there is a "destroy cell queue" on which a DESTROY
> cell is put in for one of the circuit on the cmux. An important thing for tor
> is that when it needs to send a DESTROY, it needs to _stop_ sending any queued
> cell on that circuit, dump them and only send the DESTROY cell.

Careful! I think this might be the opposite of what it needs to do.

If Tor wants to tear down a circuit, in normal circumstances it ought
to finish flushing the currently queued cells first. If it discards
the queued cells and only sends the destroy cell, then we end up with
missing data.

> Second, because of this concept of different queue for the DESTROY cell, tor
> will back and forth between that queue and the normal queue of a circuit.
> Remember, that the destroy queue is *not* per circuit but per circuitmux. This
> is used by the "cmux->last_cell_was_destroy" which is set to 1 if the last cell
> the cmux handled was a DESTROY or 0 if not.

Yes, this part is definitely a mess.

I think we need to invent and design a better way to handle destroys --
getting the abstraction layer right between the "control" cells and the
"data" cells is definitely a hard problem, especially when both kinds
of cells end up being sent through the same mechanism.

> * I believe now that we should seriously discuss the relevance of channels.
>   Originally, the idea was good that is providing an abstraction layer for the
>   relay to relay handshake and send/process cells related to the protocol. But,
>   as of now, they are half doing it.

I am not opposed to ripping out the channel abstraction.

You make a good case that it was never completed, and now it will be
harder to complete since it's been so long (and the person who designed
and built it won't be completing it). Also, if we are hoping to modularize
the Tor architecture to prepare it for component-by-component transitions
to Rust or some other language, then simplifying first is a good move.

I guess the other option is to keep it limping along and make a plan to
fix it and move to the right abstraction levels. That option would be best
if we have a particular new channel in mind that we want to add -- such
as the switch to udp transport that various research papers have proposed.

At least last I checked though, udp transport implies user-space tcp
which is not a practical thing at our scale.

All of this said though, Nick is the Chief Architect, so I will defer
to his judgment on which approach will get us a great fast stable Tor
in the long term.

> * In the short term, we should get rid of Vanilla scheduler because it
>   complefixies a lot the scheduler code by adding uneeded things to channel_t
>   but also bloated the scheduler interface with pointless function pointers for
>   instance. And in my opinion, it is not helping performance the way it is done
>   right now.

Getting rid of the Vanilla scheduler is fine with me. I imagine the Kist
plan was to leave the Vanilla scheduler in place so there's something
to fall back to in case of bugs or design surprises. It might be wisest
to leave it in during 0.3.2, so Kist can stabilize, and plan to take it
out during 0.3.3 or 0.3.4 if Kist continues looking good.

--Roger