[tor-scaling] Tuning KIST and EWMA parameters

Wed Jun 5 22:04:00 UTC 2019

Ok, we're starting have a lot of KIST and EWMA tuning discussion in the
meeting thread, so I'm going to summarize things (please correct me if I
make mistakes), and then ask some questions for Rob, Ian, dgoulet, and
pastly.

Why is this important? Together, KIST and EWMA have the potential to
make a massive impact on page load times and interactive web app
performance, if tuned properly for that use case.

However, both of these systems have only been tuned using results from
simulation, and in a way, they are the poster children for a case study
on how simulation and reality do not always match up.

Prior to developing KIST, we made some efforts to tune EWMA on the live
network, but it did not change our performance metrics much. When Rob
and team investigated why, they discovered that it was because relays on
the live Tor network were buffering everything in their kernels instead
of in their Tor daemons, which prevented EWMA from working properly, and
the idea for KIST was born.

So let's back up, and review the technical details of KIST and EWMA.

KIST[1] is the Kernel Informed Socket Transport, created by Rob Jansen
and others at NRL. It uses the Linux kernel to inform Tor when a socket
can accept data to send on a TCP connection without buffering, and how
much. By knowing exactly how much data can be sent on a TCP connection,
Tor can keep its pending network data as cell queues inside the Tor
daemon rather than as TCP buffers in the kernel, so that it can make
priority decisions about which cells to send first. KIST may also have
some additional benefits for exit nodes, by reducing buffering due to
TCP termination, but its primary benefit is that it enables us to
prioritize certain types of traffic inside the Tor network.

EWMA[2] is the current circuit scheduling mechanism that Tor uses to
make these prioritization decisions. Its goal is to favor "interactive"
low-traffic circuits over "bulk" high-traffic circuits, to improve HTTP
page load times, and de-prioritize things like bittorrent. Specifically,
each circuit gets a score based on the total number of cells it has
transmitted. Every time cells are sent on to the next hop, this score is
reduced by multiplying the total cell count by
0.5^(time_since_last_write/CircuitPriorityHalflifeMsec).
CircuitPriorityHalflifeMsec is a consensus parameter currently set to
30000 (30 seconds). Circuits with the lowest score are chosen first.
In this way, circuits that have had less activity recently are given
priority over circuits that have had a lot of activity. In simulation,
Ian's group found that good values for CircuitPriorityHalflifeMsec
ranged from 10000-66000 (10 seconds to 66 seconds).

In https://trac.torproject.org/projects/tor/ticket/29427, we realized
that KIST is a throughput bottleneck for clients, and possibly also for
small relays. Because the goal of KIST is to let enough traffic
accumulate in relay circuit queues so that EWMA can make priority
decisions, it must wait for this to happen before writing data to the
kernel. Currently, this interval is based on time, and has been set to
10ms based on results from simulation. Unfortunately, for clients and
relays without many TCP connections, a 10ms write interval is not enough
to actually fill the link capacity. Pastly's recent simulations in that
ticket found that 2ms was substantially better for client throughput
than the current 10ms default.

For clients that don't make a lot of priority decisions, 2ms or 1ms is
probably fine, as clients don't use KIST so much, but there may be some
variability here for onion services. The impact of this client-side
change will be reflected in the CDF-DL and CDF-TTLB throughput metrics.
Other metrics shouldn't see much change.

For relays, there is a lot of variability in the optimal scheduler
interval here. With a LOT of additional development work, we might be
able to autotune this value for individual relays, but the wall-clock
time for this autotuning will be very long: it requires waiting for
kernel patches to get merged and released kernels to appear with them,
in addition to major changes to Tor.

The cost of using a relay KIST interval that is too long is that the
link cannot be saturated, especially if there are not enough TCP
connections. This will manifest itself as poor throughput on circuits
that use smaller relays with few other TCP connections open, regardless
of that relay's load. (This effect may cause us to mistakenly believe
that small relays are always overloaded, when in fact that may just be
KIST to death).

The cost of using a relay KIST interval that is too short is that EWMA
cannot take effect, which means that "bulk" traffic will drown out
"interactive" traffic, impacting latency and page load times. This will
manifest as worse results for the CDF-TTFB and CDF-RTT metrics, and also
as increased circuit build times.

Luckily, as-written, the KIST code lets torrc values override consensus
paramters for this interval. So we can update torperf clients and start
shipping Tor Browser with a low KISTSchedRunInterval immediately, and
then tune the relay-side parameter.

Ok, so now for the questions:

  * What client-side KISTSchedRunInterval torrc values should we use?
1ms, 2ms? I saw pastly doing some chutney tests, but has anyone tried
these values in the torrc of clients on the live network?

  * What relay-side KISTSchedRunInterval consensus parameter values
should we try? How confident are we that 10ms is our best
one-size-fits-all value for now? Should we also try 2ms on the live
network? 5ms? 20ms?

  * Can/How should we concurrently tune CircuitPriorityHalflifeMsec? We
know that we can make throughput worse with KIST intervals that are too
large, but if CircuitPriorityHalflifeMsec is not set right, we won't see
as much benefit from KIST on the latency side. Should we tune it first,
or after, or pick a couple (KISTSchedRunInterval,
CircuitPriorityHalflifeMsec) value pairs?

  * How should we record and analyze the echo server RTT metric that we
discussed in the jitsi meeting? We want some way to capture that RTTs
are constant over time over a circuit, as well as measure the
network-wide RTT distribution over all circuits. Are these separate metrics?

1. https://security.cs.georgetown.edu/~msherr/papers/kist-tops2018.pdf
2. https://www.cypherpunks.ca/~iang/pubs/ewma-ccs.pdf

-- 
Mike Perry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-scaling/attachments/20190605/c08a8cf7/attachment.sig>