[stem/master] Inconsistent unit test failures in control.controller

commit 5bd8a10632e3602e03b4ad2ac8829c57585c4fd6 Author: Damian Johnson <atagar@torproject.org> Date: Wed Oct 24 13:31:55 2018 -0700 Inconsistent unit test failures in control.controller On occasion our unit tests that emit then listen for controller events fail with... ====================================================================== FAIL: test_events_get_received ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/mock/mock.py", line 1305, in patched return func(*args, **keywargs) File "/srv/jenkins-workspace/workspace/stem-tor-ci/test/unit/control/controller.py", line 684, in test_events_get_received self.bw_listener.assert_called_once_with(BW_EVENT) File "/usr/lib/python2.7/dist-packages/mock/mock.py", line 947, in assert_called_once_with raise AssertionError(msg) AssertionError: Expected 'mock' to be called once. Called 0 times. ---------------------------------------------------------------------- The specific test varies, but it should always be one of our emit_event() callers. This has caused nine jenkins test failures in the last three months (so pretty noisy). Dave ran into this too... https://trac.torproject.org/projects/tor/ticket/27173 I can't reliably (or even frequently) reproduce this locally but I think I know what's going on... * emit_event() adds an event to the queue via put(). This is a blocking call so we can be assured the event's in there. * emit_event() then calls set() on the event_notice condition so the Controller's event thread will process the event. * The controller's event thread calls get_nowait() on its event queue. This *ususally* gets the event we just inserted, but since this is get_nowait() rather than get() it's non-blocking, and may raise a queue.Empty instead. * When queue.Empty gets thrown we once again wait on the event_notice condition. * emit_event() then calls close() on its Controller which joins and terminates its threads. The close() method calls set() on event_noice to give it a *second* opportunity to get the event. * Once again, because we use get_nowait() there's a small chance we'll raise queue.Empty without processing the event. * The event thread then terminates with the enqueued event never being processed. * Finally, we get the assertion failure above. This is actually a real bug that can cause Controller users to be highly delayed or never receive an event. For instance... * Tor emits an event. * Our reader thread enqueues the event and calls set() on the condition to notify the event thread. * For the reasons mentioned above it fails to do so. * Because we use wait() without a timeout our event thread then resumes blocking, causing the event to not get another chance to be delivered until a second event comes along. This sucks, and is pretty easy to remediate. Rather than waiting indefinitely we should instead have our event thread periodically check re-check its own queue for undelivered messages (picked 50 ms as a decent sounding balance between responsiveness and load). Our test's emit_event() in turn now blocks until the event is delivered. Needless to say this is a long song and dance so quite possible I'm in the wrong on something but this seems, both in theory and practice thus far, to be a good step forward. --- docs/change_log.rst | 4 ++++ stem/control.py | 3 ++- test/unit/control/controller.py | 1 + 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/change_log.rst b/docs/change_log.rst index 9230350c..f47ec984 100644 --- a/docs/change_log.rst +++ b/docs/change_log.rst @@ -45,6 +45,10 @@ Unreleased The following are only available within Stem's `git repository <download.html>`_. + * **Controller** + + * Controller events could fail to be delivered in a timely fashion (:trac:`27173`) + .. _version_1.7: Version 1.7 (October 7th, 2018) diff --git a/stem/control.py b/stem/control.py index 2e893788..68873d76 100644 --- a/stem/control.py +++ b/stem/control.py @@ -966,6 +966,7 @@ class BaseController(object): try: event_message = self._event_queue.get_nowait() self._handle_event(event_message) + self._event_queue.task_done() # Attempt to finish processing enqueued events when our controller closes @@ -978,7 +979,7 @@ class BaseController(object): if not self.is_alive(): break - self._event_notice.wait() + self._event_notice.wait(0.05) self._event_notice.clear() diff --git a/test/unit/control/controller.py b/test/unit/control/controller.py index ed72fbe8..13d26c4b 100644 --- a/test/unit/control/controller.py +++ b/test/unit/control/controller.py @@ -866,6 +866,7 @@ class TestControl(unittest.TestCase): uncast_event = ControlMessage.from_str(event.raw_content()) self.controller._event_queue.put(uncast_event) self.controller._event_notice.set() + self.controller._event_queue.join() # block until the event is consumed finally: is_alive_mock.return_value = False self.controller._close()
participants (1)
-
atagar@torproject.org