[tor-scaling] Summary of 5/31 meeting; next steps

Tue Jun 4 23:44:36 UTC 2019

> On Jun 4, 2019, at 8:00 AM, David Goulet <dgoulet at torproject.org> wrote:
> 
>> 
>> I also assume this means that the KIST tuning must be done before the
>> EWMA tuning?
> 
> As I said above, for relays, won't change anything (in theory!!!). I recall
> Rob/Matt did experiment during the KIST development and 10msec was the optimal
> waiting time they found between filling kernel buffers and coming back seeing
> them "empty".
> 

There are multiple factors at play when deciding on the KIST scheduler run frequency:

1. Cell/circuit priority

As soon as Tor writes the cells to the kernel socket buffer, we no longer have control over priority. If you write a worse-priority cell (BitTorrent cells are always available) to an already bloated socket buffer, but then the next instant you get in some better priority cells and you write them, the worse-priority cells still get sent before the better-priority cells.

Adding some time between write operations allows the kernel to flush out the bloated buffers while allowing Tor to consider a larger set of circuits when making priority decisions.

But you can't wait too long, else the kernel buffers will become empty (all data flushed) and then you won't fully utilize the network interface.

2. Relay overhead

You want to avoid sending too fast, such that the writing operation itself takes longer than the period in which you run the scheduler. KIST does have some amount of overhead in that it collects TCP information from write-pending sockets.

We observed a linear relationship between the amount of time required to collect TCP information on all write-pending sockets and the number of such sockets (1.08 microseconds per pending socket), independent of the total number of open sockets.

1. is described in Section 7.3, and 2. is described in Section 8.3.3:
https://www.robgjansen.com/publications/kist-tops2018.pdf

> This whole approach is honestly expensive (not KIST) but the busy loop network
> scheduling. One of the main reason we need to have this "wait period" is
> because cell priority as the KIST paper explains it.
> 
> I had a long phone call with Rob/Matt about this some months ago, and there
> are real optimization we could do to how tor works regarding its cell
> relaying.

I agree that we should move toward the design that David discusses. David has concerns about not having enough time to do it. I also have concerns about getting the patch into the kernel in a reasonable time. In any case, I think this design change isn't quite in the "tuning" phase yet (it's probably closer to the research/testing phase).

Peace, love, and positivity,
Rob