[tor-bugs] #12890 [- Select a component]: Design and implement optimizations for socket write limits

Tue Aug 19 06:09:48 UTC 2014

#12890: Design and implement optimizations for socket write limits
----------------------------------+----------------------------
 Reporter:  robgjansen            |          Owner:  robgjansen
     Type:  enhancement           |         Status:  new
 Priority:  normal                |      Milestone:
Component:  - Select a component  |        Version:
 Keywords:                        |  Actual Points:
Parent ID:  #12541                |         Points:
----------------------------------+----------------------------
 KIST has two components: global scheduling (#9262) and socket write
 limits. This ticket is to track discussion about the design that should be
 implemented to realize socket write limits, and discussion about the
 implementation.

 The goal of the write limit is to never send to the kernel what the kernel
 wouldn't send out to the network anyway due to throttling at the TCP
 layer. Rob's USENIX Security paper computed write limits for each socket
 as

 {{{
 sock_space = sock_buf_size - sock_buf_len
 tcp_space = (snd_cwnd - snd_unacked) * mss
 sock_write_limit = min(sock_space, tcp_space)
 }}}

 And then a global write limit across all sockets for each scheduling round
 is computed according to the upstream bandwidth of the relay and the
 configured write callback time interval. Writing in a given round ends
 when either the global limit is reached, or all of the socket limits are
 reached.

 The TCP information can be collected with a getsockopt call, but doing
 this for every socket for every write round (callback interval) can get
 expensive. A kernel hacker, Patrick McHardy, suggested using the "netlink
 socket diag" interface (examples [https://github.com/kristrev/inet-diag-
 example/blob/master/inet_monitor.c here] and
 [http://linuxgazette.net/136/pfeiffer.html here]) to collect information
 for multiple sockets all at once instead of a separate system call for
 each.

 Note that the socket write limit need not actually be computed, because
 the kernel will return EAGAIN when the socket is full anyway. Along these
 lines, Bryan Ford suggested setting the socket buffer size based on the
 amount Tor thinks it should send plus a little extra (e.g.,
 tcp_space*1.25), and then let the kernel push back automatically instead
 of trying to compute a new write limit for every socket for every write
 interval round. Then Tor can continue to try to write as much as it can
 and let the kernel push back when Tor should stop. In this case, we need
 to ensure TCP auto-tuning is disabled, as otherwise it may undo our
 settings by adjusting our socket buffer sizes underneath us.

 I think we need two intervals: e.g., we want to try to write every 10
 milliseconds, and then update snd_cwnd/write limits/socket buffer sizes
 every 100 milliseconds.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/12890>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online