[tor-bugs] #29427 [Core Tor/Tor]: kist: Poor performance with a small amount of sockets

Thu Feb 7 17:31:32 UTC 2019

#29427: kist: Poor performance with a small amount of sockets
------------------------------+--------------------------------
     Reporter:  dgoulet       |      Owner:  (none)
         Type:  defect        |     Status:  new
     Priority:  Medium        |  Milestone:  Tor: 0.4.1.x-final
    Component:  Core Tor/Tor  |    Version:  Tor: 0.3.2.1-alpha
     Severity:  Major         |   Keywords:  tor-sched, kist
Actual Points:                |  Parent ID:
       Points:                |   Reviewer:
      Sponsor:                |
------------------------------+--------------------------------
 We just recently found that KIST is performing very poorly if tor has very
 little amount of sockets.

 == How KIST operates

 KIST is scheduled if some cells are put on a circuit queue. A scheduler
 run might not handle all cells because it depends on the available space
 in the TCP buffer for the socket. What KIST does at the moment is
 reschedule itself in 10ms (static value).

 The problem here is that if there are very few sockets (like most tor
 clients), then KIST will be able to handle one socket very fast, let say
 in 1ms, and then it will sleep for another 9ms until KIST is rescheduled.

 That 9ms waiting time means that tor is not pushing bytes on the wire even
 though it could during that time. See the attached graph made by pastly,
 you can see how much KIST badly under performs with the current 10ms.

 == Consequences

 (Might be more, don't treat this as an exhaustive list)

 1. Clients are basically capped in bandwidth because they in general only
 talk to the Guard on a single socket.

 2. A new relay joining the network won't have any connections so when the
 authority measures it, or our bw. scanners, they will only be able to
 measure a capped value compared to what the relay could actually do (if
 higher). This measurement will recover after a while once the relay starts
 seeing traffic and the number of sockets ramps up.

 == Solution

 As you can see on the attached graph, bringing the scheduler interval time
 down to 2ms gives us better performance than Vanilla. That could be a
 short term solution.

 A better solution, a bit more medium-term, would be to make that
 scheduling interval dynamic depending on how fast tor thinks the TCP
 buffer on the socket will get emptied. That depends on the connection
 throughput basically. For example, a 100mbit NIC towards a Guard might
 only push through 10mbit so we would need a way for tor to learn that per-
 connection which would allow KIST to estimate when it needs to be
 rescheduled for that connection.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29427>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online