[tor-bugs] #33972 [Internal Services/Tor Sysadmin Team]: Add Nagios check for CollecTor

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Apr 24 19:42:14 UTC 2020


#33972: Add Nagios check for CollecTor
-------------------------------------------------+-------------------------
 Reporter:  karsten                              |          Owner:  tpa
     Type:  task                                 |         Status:
                                                 |  merge_ready
 Priority:  Medium                               |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:                                       |  Actual Points:
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------
Changes (by karsten):

 * status:  needs_review => merge_ready


Comment:

 Replying to [comment:3 anarcat]:
 > Replying to [ticket:33972 karsten]:
 > > We currently have a metrics-specific Nagios host that we want to shut
 down soon. One of its checks is to see whether CollecTor's files are
 becoming unavailable or stale. This check is not easily transferable to
 Tor's Nagios host, because it depends on a code base that is not being
 maintained anymore and that we want to deploy on Tor's Nagios host. That's
 why I rewrote this check in a simple Python script to be deployed on Tor's
 Nagios instance.
 > >
 > > Questions:
 > >
 > >  - anarcat and/or weasel: do you have any concerns about deploying
 this check in Tor's Nagios host alongside the
 [https://gitweb.torproject.org/admin/tor-nagios.git/tree/tor-nagios-
 checks/checks/tor-check-onionoo Onionoo check]?
 >
 > I reviewed the code quickly, and it looks reasonable. Assuming
 performance is acceptable, this should be fine.

 The script runs in under a second here, where most of the time is spent on
 downloading the 1 MiB index.json file.

 > >  - irl: do you spot any checks in this Python script that are way off,
 or other checks that are missing?
 > >
 > >  - atagar, other Python people: do you mind reviewing the Python code
 for general code improvements? The goal is to have a single, self-
 contained, easy-to-read Python script that produces just the data we need
 for Nagios to send out alerts.
 >
 > I would add to that "runs fast". The way Nagios schedules checks makes
 it suffer if there's a check that takes too long. Think "open TCP port"
 instead of "make a full HTTP request that downloads a 3MB file" or "...
 renders a complex report". :) We have some leeway of course, but if it can
 be optimized, it's a definite plus.

 Makes sense.

 > I would also mention there's a "nagiosplugin" python module that could
 be used instead of rolling our own behavior.
 >
 > https://pypi.org/project/nagiosplugin/
 >
 > It might be overkill for this simple plugin, but could be useful if you
 want to actually send metrics like age and so on and have them processable
 on the other side (which we don't currently do, mind you).

 That looks useful. Is that module available on the Tor Nagios host? I
 agree that it might be overkill for this plugin, but it might be useful
 for future plugins we write, and then we could go back and simplify the
 existing scripts for Onionoo and CollecTor.

 I'm attaching a fixed version of the script where I removed a superfluous
 comma that somehow slipped in when doing the final cleanup.

 Can you deploy this script on Tor's Nagios for collector.torproject.org
 (not for collector2.torproject.org, though)?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33972#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list