[tor-bugs] #33972 [Internal Services/Tor Sysadmin Team]: Add Nagios check for CollecTor

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Apr 23 15:48:12 UTC 2020


#33972: Add Nagios check for CollecTor
-------------------------------------------------+-------------------------
 Reporter:  karsten                              |          Owner:  tpa
     Type:  task                                 |         Status:
                                                 |  needs_review
 Priority:  Medium                               |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:                                       |  Actual Points:
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by anarcat):

 Replying to [ticket:33972 karsten]:
 > We currently have a metrics-specific Nagios host that we want to shut
 down soon. One of its checks is to see whether CollecTor's files are
 becoming unavailable or stale. This check is not easily transferable to
 Tor's Nagios host, because it depends on a code base that is not being
 maintained anymore and that we want to deploy on Tor's Nagios host. That's
 why I rewrote this check in a simple Python script to be deployed on Tor's
 Nagios instance.
 >
 > Questions:
 >
 >  - anarcat and/or weasel: do you have any concerns about deploying this
 check in Tor's Nagios host alongside the
 [https://gitweb.torproject.org/admin/tor-nagios.git/tree/tor-nagios-
 checks/checks/tor-check-onionoo Onionoo check]?

 I reviewed the code quickly, and it looks reasonable. Assuming performance
 is acceptable, this should be fine.

 >  - irl: do you spot any checks in this Python script that are way off,
 or other checks that are missing?
 >
 >  - atagar, other Python people: do you mind reviewing the Python code
 for general code improvements? The goal is to have a single, self-
 contained, easy-to-read Python script that produces just the data we need
 for Nagios to send out alerts.

 I would add to that "runs fast". The way Nagios schedules checks makes it
 suffer if there's a check that takes too long. Think "open TCP port"
 instead of "make a full HTTP request that downloads a 3MB file" or "...
 renders a complex report". :) We have some leeway of course, but if it can
 be optimized, it's a definite plus.

 I would also mention there's a "nagiosplugin" python module that could be
 used instead of rolling our own behavior.

 https://pypi.org/project/nagiosplugin/

 It might be overkill for this simple plugin, but could be useful if you
 want to actually send metrics like age and so on and have them processable
 on the other side (which we don't currently do, mind you).

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33972#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list