[tor-commits] [tor/master] Fix busy Libevent loops (infinite loops in Shadow)

nickm at torproject.org nickm at torproject.org
Wed Jun 13 20:30:29 UTC 2012


commit 03b48352c6b8416faee50e769bf226b06958e253
Author: Rob G. Jansen <jansen at cs.umn.edu>
Date:   Tue Feb 28 18:19:49 2012 -0500

    Fix busy Libevent loops (infinite loops in Shadow)
    
    There is a bug causing busy loops in Libevent and infinite loops in
    the Shadow simulator. A connection that is marked for close, wants
    to flush, is held open to flush, but is rate limited (the token
    bucket is empty) triggers the bug.
    
    This commit fixes the bug. Details are below.
    
    This currently happens on read and write callbacks when the active
    socket is marked for close. In this case, Tor doesn't actually try
    to complete the read or write (it returns from those methods when
    marked), but instead tries to clear the connection with
    conn_close_if_marked(). Tor will not close a marked connection that
    contains data: it must be flushed first. The bug occurs when this
    flush operation on the marked connection can not occur because the
    connection is rate-limited (its write token bucket is empty).
    
    The fix is to detect when rate limiting is preventing a marked
    connection from properly flushing. In this case, it should be
    flagged as read/write_blocked_on_bandwidth and the read/write events
    de-registered from Libevent. When the token bucket gets refilled, it
    will check the associated read/write_blocked_on_bandwidth flag, and
    add the read/write event back to Libevent, which will cause it to
    fire. This time, it will be properly flushed and closed.
    
    The reason that both read and write events are both de-registered
    when the marked connection can not flush is because both result in
    the same behavior. Both read/write events on marked connections will
    never again do any actual reads/writes, and are only useful to
    trigger the flush and close the connection. By setting the
    associated read/write_blocked_on_bandwidth flag, we ensure that the
    event will get added back to Libevent, properly flushed, and closed.
    
    Why is this important? Every Shadow event occurs at a discrete time
    instant. If Tor does not properly deregister Libevent events that
    fire but result in Tor essentially doing nothing, Libevent will
    repeatedly fire the event. In Shadow this means infinite loop,
    outside of Shadow this means wasted CPU cycles.
---
 src/or/main.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/src/or/main.c b/src/or/main.c
index 163dd5d..a5ad1d9 100644
--- a/src/or/main.c
+++ b/src/or/main.c
@@ -848,6 +848,20 @@ conn_close_if_marked(int i)
                            "Holding conn (fd %d) open for more flushing.",
                            (int)conn->s));
         conn->timestamp_lastwritten = now; /* reset so we can flush more */
+      } else if (sz == 0) { /* retval is also 0 */
+    	/* Connection must flush before closing, but it's being rate-limited.
+    	   Let's remove from Libevent, and mark it as blocked on bandwidth so it
+    	   will be re-added on next token bucket refill. Prevents busy Libevent
+    	   loops where we keep ending up here and returning 0 until we are no
+    	   longer blocked on bandwidth. */
+        if (connection_is_reading(conn)) {
+          conn->read_blocked_on_bw = 1;
+          connection_stop_reading(conn);
+        }
+        if (connection_is_writing(conn)) {
+          conn->write_blocked_on_bw = 1;
+          connection_stop_writing(conn);
+        }
       }
       return 0;
     }





More information about the tor-commits mailing list