commit 023cbeeff9798b8600f5191ed3084e57624fee10 Author: Damian Johnson atagar@torproject.org Date: Sun Apr 22 12:02:31 2018 -0700
Errors are getting incorrectly suppressed
Interesting! Roger reported an issue where he isn't getting notified about moria1's bwauth scanner being down. In taking a look at the logs I'm indeed seeing something funky...
04/21/2018 21:05:46 [DEBUG] NOTICE: The following directory authorities are not reporting bandwidth scanner results: gabelmoo 04/21/2018 21:05:47 [INFO] Suppressing The_following_directory_authorities_are_not_reporting_bandwidth_scanner_results:_gabelmoo, time remaining is 1 3 hours 04/21/2018 21:05:47 [INFO] All 1 issues were suppressed. Not sending a notification. 04/21/2018 21:05:47 [DEBUG] Checks finished, runtime was 46.84 seconds 04/21/2018 22:05:45 [DEBUG] ERROR: The following directory authorities are not reporting bandwidth scanner results: gabelmoo, moria1 04/21/2018 22:05:46 [INFO] All 1 issues were suppressed. Not sending a notification. 04/21/2018 22:05:46 [DEBUG] Checks finished, runtime was 45.56 seconds
I suspect what's happening is this...
* When only gabelmoo is down this is a NOTICE runlevel notification, which works.
* When moria1 is down as well it's an ERROR runlevel notice which should generate an email every hour, but isn't. I think this is due to a bug where ERROR notices in specific are getting incorrectly suppressed.
Lets give this a try... --- consensus_health_checker.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/consensus_health_checker.py b/consensus_health_checker.py index beb8c70..9710b5e 100755 --- a/consensus_health_checker.py +++ b/consensus_health_checker.py @@ -193,7 +193,7 @@ def is_rate_limited(issue): hours = issue.get_suppression_duration()
if hours == 0: - return True + return False
current_time = int(time.time()) last_seen = stem.util.conf.get_config('last_notified').get(key, 0)
tor-commits@lists.torproject.org