[tor-bugs] #24665 [Core Tor/Tor]: sched: In KIST, the extra_space kernel value needs to be allowed to be negative

Thu Dec 21 03:29:06 UTC 2017

#24665: sched: In KIST, the extra_space kernel value needs to be allowed to be
negative
--------------------------+------------------------------------
 Reporter:  dgoulet       |          Owner:  dgoulet
     Type:  defect        |         Status:  needs_review
 Priority:  Very High     |      Milestone:  Tor: 0.3.2.x-final
Component:  Core Tor/Tor  |        Version:  Tor: 0.3.2.1-alpha
 Severity:  Normal        |     Resolution:
 Keywords:  tor-sched     |  Actual Points:
Parent ID:                |         Points:
 Reviewer:                |        Sponsor:
--------------------------+------------------------------------

Comment (by yawning):

 Replying to [comment:3 dgoulet]:
 > Replying to [comment:2 yawning]:
 > > The branch looks sensible to me.
 > >
 > > My inner pedant will say that "If a connection to a relay was
 unreliable meaning tor was struggling to flush bytes towards the relay" is
 misleading at best, since the congestion window shrinking (by quite a bit)
 is an expected part of how TCP/IP works and not particularly indicative of
 an overloaded condition on it's own.
 >
 > Ah I think I failed to explain my comment correctly. The point of that
 line was a reason for KIST to actually consider the `notsent` queue size
 *because* it could be that the connection is struggling towards the relay.
 >
 > How would you phrase it in a proper English?

 "The KIST scheduler did not correctly account for data already enqueued in
 each connection's send socket buffer, particularly in cases when the
 TCP/IP congestion window was reduced between scheduler calls.  This
 situation lead to excessive per-connection buffering in the kernel, and a
 potential memory DoS.  Fixes bug 24665; bugfix on 0.3.2.1-alpha."

 Maybe not human friendly enough.

 > > While you're here, assuming the scheduler is called significantly
 faster than the RTT of most links (read that as "If 10 ms is lower than
 the RTT of most if not all links"), you can/should reduce
 `sock_buf_size_factor` as well, because you aren't going to get a full
 congestion window worth of ACKs back between scheduler calls in common
 cases.
 >
 > Interesting... if the channel is quite active, yes the scheduler tick
 for it should be 10ms.
 >
 > What is a reasonable size factor in your opinion? It seems we can get
 some RTT information with the `getsockopt()` call within `struct
 tcp_info`, maybe we could adjust a scaling factor based on those values?
 (If that is an idea, we should open a ticket to make way for this one to
 be merged)

 There isn't a good "one size fits all" solution.  Setting it too low will
 gimp performance on fast low latency links, setting it too high right now
 bloats the various buffers.  I would personally opt more toward avoiding
 the latter given all the Fun that's happening.

 As you noted, `tcpi_rtt` gives the smoothed RTT estimate (and
 `tcpi_rttvar` the RTT variance if you need it), which is probably
 sufficient to give a better reasonable guess here, as a first pass, I
 would recommend doing something based on the the scheduler interval to
 smoothed RTT ratio, with a hard maximum at `1.0`, but as you noted this is
 probably best discussed in a separate ticket.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24665#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online