[tor-commits] [stem/master] Inconsistent unit test failures in control.controller

atagar at torproject.org atagar at torproject.org
Wed Oct 24 20:54:35 UTC 2018


commit 5bd8a10632e3602e03b4ad2ac8829c57585c4fd6
Author: Damian Johnson <atagar at torproject.org>
Date:   Wed Oct 24 13:31:55 2018 -0700

    Inconsistent unit test failures in control.controller
    
    On occasion our unit tests that emit then listen for controller events fail
    with...
    
      ======================================================================
      FAIL: test_events_get_received
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/usr/lib/python2.7/dist-packages/mock/mock.py", line 1305, in patched
          return func(*args, **keywargs)
        File "/srv/jenkins-workspace/workspace/stem-tor-ci/test/unit/control/controller.py", line 684, in test_events_get_received
          self.bw_listener.assert_called_once_with(BW_EVENT)
        File "/usr/lib/python2.7/dist-packages/mock/mock.py", line 947, in assert_called_once_with
          raise AssertionError(msg)
      AssertionError: Expected 'mock' to be called once. Called 0 times.
    
      ----------------------------------------------------------------------
    
    The specific test varies, but it should always be one of our emit_event()
    callers. This has caused nine jenkins test failures in the last three months
    (so pretty noisy). Dave ran into this too...
    
      https://trac.torproject.org/projects/tor/ticket/27173
    
    I can't reliably (or even frequently) reproduce this locally but I think I know
    what's going on...
    
      * emit_event() adds an event to the queue via put(). This is a blocking call
        so we can be assured the event's in there.
    
      * emit_event() then calls set() on the event_notice condition so the
        Controller's event thread will process the event.
    
      * The controller's event thread calls get_nowait() on its event queue. This
        *ususally* gets the event we just inserted, but since this is get_nowait()
        rather than get() it's non-blocking, and may raise a queue.Empty instead.
    
      * When queue.Empty gets thrown we once again wait on the event_notice
        condition.
    
      * emit_event() then calls close() on its Controller which joins and
        terminates its threads. The close() method calls set() on event_noice to
        give it a *second* opportunity to get the event.
    
      * Once again, because we use get_nowait() there's a small chance we'll raise
        queue.Empty without processing the event.
    
      * The event thread then terminates with the enqueued event never being
        processed.
    
      * Finally, we get the assertion failure above.
    
    This is actually a real bug that can cause Controller users to be highly
    delayed or never receive an event. For instance...
    
      * Tor emits an event.
    
      * Our reader thread enqueues the event and calls set() on the condition to
        notify the event thread.
    
      * For the reasons mentioned above it fails to do so.
    
      * Because we use wait() without a timeout our event thread then resumes
        blocking, causing the event to not get another chance to be delivered until
        a second event comes along.
    
    This sucks, and is pretty easy to remediate. Rather than waiting indefinitely
    we should instead have our event thread periodically check re-check its own
    queue for undelivered messages (picked 50 ms as a decent sounding balance
    between responsiveness and load). Our test's emit_event() in turn now blocks
    until the event is delivered.
    
    Needless to say this is a long song and dance so quite possible I'm in the
    wrong on something but this seems, both in theory and practice thus far, to be
    a good step forward.
---
 docs/change_log.rst             | 4 ++++
 stem/control.py                 | 3 ++-
 test/unit/control/controller.py | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/docs/change_log.rst b/docs/change_log.rst
index 9230350c..f47ec984 100644
--- a/docs/change_log.rst
+++ b/docs/change_log.rst
@@ -45,6 +45,10 @@ Unreleased
 The following are only available within Stem's `git repository
 <download.html>`_.
 
+ * **Controller**
+
+  * Controller events could fail to be delivered in a timely fashion (:trac:`27173`)
+
 .. _version_1.7:
 
 Version 1.7 (October 7th, 2018)
diff --git a/stem/control.py b/stem/control.py
index 2e893788..68873d76 100644
--- a/stem/control.py
+++ b/stem/control.py
@@ -966,6 +966,7 @@ class BaseController(object):
       try:
         event_message = self._event_queue.get_nowait()
         self._handle_event(event_message)
+        self._event_queue.task_done()
 
         # Attempt to finish processing enqueued events when our controller closes
 
@@ -978,7 +979,7 @@ class BaseController(object):
         if not self.is_alive():
           break
 
-        self._event_notice.wait()
+        self._event_notice.wait(0.05)
         self._event_notice.clear()
 
 
diff --git a/test/unit/control/controller.py b/test/unit/control/controller.py
index ed72fbe8..13d26c4b 100644
--- a/test/unit/control/controller.py
+++ b/test/unit/control/controller.py
@@ -866,6 +866,7 @@ class TestControl(unittest.TestCase):
           uncast_event = ControlMessage.from_str(event.raw_content())
           self.controller._event_queue.put(uncast_event)
           self.controller._event_notice.set()
+          self.controller._event_queue.join()  # block until the event is consumed
         finally:
           is_alive_mock.return_value = False
           self.controller._close()



More information about the tor-commits mailing list