[tor-bugs] #12890 [Core Tor/Tor]: Design and implement optimizations for socket write limits

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Jul 11 23:52:48 UTC 2017


#12890: Design and implement optimizations for socket write limits
-------------------------------------------+-------------------------------
 Reporter:  robgjansen                     |          Owner:
     Type:  enhancement                    |         Status:  closed
 Priority:  Medium                         |      Milestone:  Tor:
                                           |  0.3.2.x-final
Component:  Core Tor/Tor                   |        Version:
 Severity:  Normal                         |     Resolution:  fixed
 Keywords:  tor-relay kist networking tcp  |  Actual Points:
Parent ID:  #12541                         |         Points:
 Reviewer:                                 |        Sponsor:
-------------------------------------------+-------------------------------
Changes (by pastly):

 * status:  new => closed
 * resolution:   => fixed
 * milestone:  Tor: unspecified => Tor: 0.3.2.x-final


Comment:

 A lot has changed in the last year that I've been implementing KIST.

 It looks like most/all of this ticket deals with the KIST prototype that
 would run stuff off of the main thread and collect stats on all sockets.
 That's no longer a thing.

 Huge optimization: when it comes time to do some scheduling, only collect
 TCP info for the sockets that have channels ready to send. To quote our
 new KIST paper that hasn't been accepted anywhere yet.

 > We observed that the median number of write-pending sockets that
 accumulated during a 10 millisecond period was 23 (with min=1, q1=18,
 q3=27, and max=127), while the median amount of time to collect TCP
 information on all write-pending sockets was 23 microseconds (with min=1,
 q1=17, q3=33, and max=674). We observed a linear relationship between the
 amount of time required to collect TCP information on all write-pending
 sockets and the number of such sockets (1.08 microseconds per pending
 socket), independent of the total number of open sockets. Therefore, we
 believe that the KIST overhead, with our optimization of only collecting
 TCP information on pending sockets, should be tolerable to run in the main
 thread for even the fastest Tor relay.

 But what does the algorithm look like? See
 [https://gitweb.torproject.org/user/pastly/tor.git/tree/src/or/scheduler_kist.c?h
 =kist-fortor-01&id=7cdff19bc14136e792c6d1ebcc8dfb8631a32db8#n205 here].

 {{{
   struct tcp_info tcp;
   socklen_t tcp_info_len = sizeof(tcp);
   getsockopt(sock, SOL_TCP, TCP_INFO, (void *)&(tcp), &tcp_info_len);
   ioctl(sock, SIOCOUTQNSD, &(ent->notsent));
   ent->cwnd = tcp.tcpi_snd_cwnd;
   ent->unacked = tcp.tcpi_unacked;
   ent->mss = tcp.tcpi_snd_mss;

   int64_t tcp_space, extra_space;
   tcp_space = (ent->cwnd - ent->unacked) * ent->mss;
   if (tcp_space < 0) tcp_space = 0;
   extra_space = (ent->cwnd * ent->mss) * sock_buf_size_factor -
 ent->notsent;
   if (extra_space < 0) extra_space = 0;
   ent->limit = tcp_space + extra_space;
   if (++counter >= 1000) {
     counter -= 1000;
     log_info(LD_SCHED, "socket info: cwnd=%d unacked=%d notsent=%d
 limit=%"
         PRIi64,
         ent->cwnd,
         ent->unacked,
         ent->notsent,
         ent->limit);
 }}}

 First calculate the amount of "TCP space" there is. This is the amount of
 data that the kernel should send out onto the wire immediately since TCP
 won't limit it.

 Then calculate some amount of extra space. To avoid starving the kernel,
 we want a //little// extra data to be there so if ACKs come back before
 the next scheduling run the kernel has something to send so the wire isn't
 idle. Assuming the `sock_buf_size_factor` is 1.0, we allow up to one extra
 congestion window worth of data to sit in the outbound kernel socket
 buffer.

 Add together the TCP space and extra space and that's the socket's KIST-
 imposed write limit.

 It requires two system calls. A recent version of linux (which came out in
 March 2016, see `git describe --contains cd9b2660` against the kernel) has
 a `struct tcp_info` intended for internal use that has notsentbytes in it.
 We'd only need one system call if we took advantage of that.

 What about cross platform support?

 In the name of actually getting KIST merged and to keep the code simple,
 we make no attempt to support anything other than Linux. This is a great
 balance between maximizing benefit and development cost. KIST checks for
 the `struct tcp_info`, that it has the right members, and that it can make
 the ioctl syscall too. If so, we have full KIST support. If not, we can
 still run the KIST scheduler but make the per-socket write limit INT_MAX.
 It won't hurt.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/12890#comment:42>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list