[tor-scaling] Tuning KIST and EWMA parameters

Sat Jun 8 19:47:54 UTC 2019

> On Jun 7, 2019, at 8:18 AM, David Goulet <dgoulet at torproject.org> wrote:
> 
> So it appears that we might want to do a tor release with KISTSchedRunInterval
> being different values for client and relays? Or just tell the world to use
> "2msec" for their onion service but that won't work well...

Please, please, please let's first fix the off-by-one bug so that "KISTSchedRunInterval 2 msec" does not really mean "KISTSchedRunInterval 1 msec"...

and so "KISTSchedRunInterval 1 msec" doesn't cause an infinite loop (or whatever the buggy behavior is).

> Top of my head, I would use our flow control protocol (SENDME).
> 
> However, the protocol allows for an end point to send data as long as the
> window is above 0. In theory, in a full maximum window of 1000 cells, 10
> SENDMEs should be sent (at every 100 cells, the increment value). But it is
> entirely possible that all 1000 cells are sent towards the end point before
> even 1 SENDME is received/handled by the sender or even sent by the receiver.
> 
> In practice, *our* tor program sends a SENDME at each window increment meaning
> 100 cells. But the other side is _not_ waiting on that SENDME to send more
> data if the window is still above 0.
> 
> Which is annoying when you want to compute a RTT ;).
> 

Right.

> Thus, the only real guarantee I think we can rely on here would be when the
> window goes down to 0, once tor sends the SENDME, it can be sure that the
> other side will send a DATA cell because that side is has stopped everything
> waiting for that SENDME.
> 
> You'll get a more reliable RTT at that point.

I worry that this will only give us RTT at weird points during a circuit, or on weird circuits. I could imagine we never get to a 0 window for some circuits, which might skew the RTT measurement to only large/heavy circuits that eventually do get to a 0 window.

If we want this metric to be accurate and reliable long term, would it be worth building an explicit measurement into the protocol? Like, when one end sends a SENDME it starts a timer, then when the other ends gets the SENDME, it sets a bit in the very next data cell it sends back. Upon receiving a data cell with that bit set, stop the timer and compute the RTT.

Of course this could also be done with explicit "ping pong" control cells rather than piggy-backing on existing cells. The advantage being you don't waste a bit in every single data cell, and it would be easier to adjust the ping pong rates.

PLP,
Rob