On Wed, Jan 09, 2019 at 08:17:15AM -0500, Ian Goldberg wrote:
> On Wed, Jan 09, 2019 at 08:42:18PM +1100, Todd Hubers wrote:
> > There are early plans to distribute crypto operations across multiple cores
> > [https://trac.torproject.org/projects/tor/ticket/1749], but there might be
> > a better way.
> >
> > (I registered, but I couldn't find a way to annotate the ticket, so I'm
> > emailing for now)
> >
> > The ticket states the reason being to saturate the bandwidth available (by
> > using all the cores as efficiently as possible).
> >
> > I don't understand why a relay needs to have a "main thread". Network
> > traffic arrives as an async operation and can be sent back out
> > asynchronously. So a final strategy shouldn't have a central thread. The
> > main thread might still be needed for startup, runtime adjustment, and
> > system upkeep, but not for the core network-crypto processing; that should
> > never need to touch the main thread.
> >
> > The current proposal speaks about multi-threading crypto operations, let's
> > call that "A) Speed - Speeding up processing of a single cell". Instead, I
> > propose "B) Concurrency - Restructuring so multiple cells can be processed
> > concurrently".
> >
> > A cell of data should arrive via IO-Completion thread on a random CPU core,
> > have crypto transformation applied on the same one core, then be dispatched
> > onward out via the network. This seems to be quite a simple approach where
> > I would think crypto code can remain the same "single-threaded"
> > implementation.
> >
> > Approach [A] will have diminishing returns as the number of cores
> > increases. You can only break up a cell unit of work so much until you're
> > encrypting one byte per cpu core. However, with approach [B], if you have
> > millions of CPU cores (as an extreme) you can be processing millions of
> > cells concurrently. Therefore, I believe approach [B] would be more
> > scalable.
> >
> > What do you think?
>
> You'll have troubles if cells *on the same circuit* try to be processed
> in parallel on different cores, at least with the current circuit-level
> crypto. But, once circuits are established, handing each circuit to a
> different thread/core (or more clever worker structure) is something
> that I think at least boradly makes sense, and indeed I have been
> proposing to have my students work on.
(Of course, this only is even relevant for the very highest-bandwidth
nodes; my own node, for example, running on 5-year-old hardware with no
special configuration, was pushing 400 Mbps last month, with one core
at 80%, one at 11%, one at 6%, and the rest trivially small.)
--
Ian Goldberg
Professor and University Research Chair
Cheriton School of Computer Science
University of Waterloo
_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev