[tor-dev] Weaving a Faster Tor: paper/video of possible interest

Tue Jul 20 19:16:20 UTC 2021

On 20 Jul (01:29:54), Steven Engler wrote:
> On 2021-07-19 3:24 p.m., David Goulet wrote:
> > On 06 Jul (13:52:50), Ian Goldberg wrote:
> > > Hello tor-dev,
> > > 
> > > Steve Engler (currently part of the Shadow team) and I have a paper of
> > > possible interest to appear at ARES 2021 next month.
> > > 
> > > It's a design of a multi-threaded relay architecture, of possible
> > > particular interest when considering relay support in arti, for example
> > > (though the proof-of-concept implementation is based on the usual C
> > > codebase).
> > > 
> > > If you're interested, here are previews of:
> > > 
> > > Paper: https://cs.uwaterloo.ca/~iang/pubs/mttor-ares21.pdf
> > > Video: https://www.youtube.com/watch?v=41a6nLUJye8
> > > Code: https://git-crysp.uwaterloo.ca/sengler/tor-parallel-relay-conn
> > >        https://git-crysp.uwaterloo.ca/sengler/relay-throughput-testing
> > 
> > Interesting!!!
> > 
> > One part caught my eye in section 4.3:
> > 
> >    "The large amount of locking would harm the relay’s performance, and mes-
> >    sage passing would break some of the assumptions of Tor’s primary scheduler,
> >    the KIST scheduler. Rather than using a global scheduler, each local
> >    connection manager uses its own local scheduler which processes only the
> >    connections it owns."
> > 
> > So KIST was also put in place in order to be able to consider _all_ channels
> > within one scheduling loop so to properly applied EWMA scheduling that is
> > basically loud circuits (lots of traffic) are less prioritize from quiet ones.
> > 
> > Moving to a scheduler per thread (as in only handling its set of connections),
> > we loose that property no? And so loud circuits end up crushing quiet
> > circuits on the physical link?
> 
> (Sending this from an address that is a tor-dev list member.)
> 
> If the scheduler's prioritization needs to be exact across all circuits in
> the relay, then yes that isn't possible with a scheduler per thread.
> 
> Our argument is that we don't believe the relay needs prioritization to be
> perfect across all circuits in the relay. For relays that currently aren't
> able to saturate their physical link due to CPU performance limitations, the
> greater total throughput from multi-threading might be worth partially
> relaxing the scheduling prioritization. Performing per-thread circuit EWMA
> scheduling should still be enough to prevent loud circuits from crushing
> quiet circuits, as long as connections are load-balanced across threads in a
> way that each thread has a mix of loud and quiet circuits.

Interesting.

We would need the load balancing to be very active then, and rebalancing
connections across the thread pool basically. But, yeah, I think I can see
that the current property of needing to look at all connections for
prioritization could be relaxed if we pull off proper load-balancing between
threads but again, my guts tells me it would need to be active rebalancing.

In my experience, active rebalancing like that can be a complex beast both in
terms of concurrency but not impossible. Likely easier in more modern language
also :P.

> 
> > Another thing. Looking at figure (c), it appears only "relay/edge connections"
> > can queue cells on a circuit half *directly* that is not using a channel. I
> > assume that concurrency there between a relay connection writing a cell on a
> > circuit half and a cell received through a channel (from another circuit half)
> > has been thought of? :)
> 
> Since the connection object holds the only reference to the circuit half, it
> can access the circuit half directly without locking or message passing.
> This is good for performance and for limiting buffer bloat. Cells arriving
> at a circuit half on a channel should be queued in that channel and
> shouldn't directly trigger any access on the circuit half (the channel
> shouldn't hold a reference to that circuit half). Since the connection
> object is in charge of both writing a cell on a circuit half and having that
> circuit half process cells that are queued on its channel, there shouldn't
> be any concurrency issues here.

Ok. Let me try to summarize: A connection reads its inbound cells, queue them
onto the correct circuit half (which I assume sticks to that connection and
thus no concurrent access) which then process them and sends them on its
channel (where a channel is always 1:1 circuit half relationship) and that
receiving circuit half (at that point on another thread) then enqueue them in
the scheduler list (if need be) so it can end up sending them on the right
connection.

> 
> > I'm asking also because within Tor, there are numerous places where a cell is
> > enqueued on a circuit (even an OR circuit like TRUNCATE for instance) and so
> > those place would need to use a channel or use the way relay connections write
> > them (which appears to be without a channel)?
> 
> We were only able to discuss this briefly in the paper, but there's a bit
> more about that in section 5.3.4 here:
> https://uwspace.uwaterloo.ca/bitstream/handle/10012/16108/Engler_Steven.pdf
> 
> Generally, we want cells to only be enqueued by the circuit half's
> connection object rather than having many places in tor try to communicate
> directly with circuit objects. When other parts of tor want to enqueue a
> cell on a circuit half, they should use the relay controller to send a
> message to the other thread's connection manager along one of the async
> channels shown in figure 4.a. The connection manager can then have the
> corresponding relay connection enqueue a cell directly on the circuit half.

Ok!

Thanks for the answers!
David

-- 
ZpHhHUq0V09YaQpr2fmlzErQIDjsDcy67bcQlORvsSw=
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20210720/293bcb6b/attachment.sig>