[tor-bugs] #33810 [Internal Services/Tor Sysadmin Team]: ganeti monitoring

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Apr 3 21:35:26 UTC 2020


#33810: ganeti monitoring
-------------------------------------------------+---------------------
 Reporter:  anarcat                              |          Owner:  tpa
     Type:  task                                 |         Status:  new
 Priority:  Low                                  |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Minor                                |     Resolution:
 Keywords:                                       |  Actual Points:
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+---------------------
Description changed by anarcat:

Old description:

> we're migrating everything into ganeti, but maybe there's some extra
> monitoring we could think about, as ganeti is way more knowledgeable
> about its own internals than libvirt was. or at least that's the feeling
> I get.
>
> some ideas:
>
>  * we could have a nagios plugin that checks for N+1. riseup has
> something like this
>  * we could have a grafana dashboard that shows us the state of the
> cluster. we already have the main dashboard which we can set to show only
> the ganeti cluster

New description:

 we're migrating everything into ganeti, but maybe there's some extra
 monitoring we could think about, as ganeti is way more knowledgeable about
 its own internals than libvirt was. or at least that's the feeling I get.

 some ideas:

  * we could have a nagios plugin that checks for N+1. riseup has something
 like this
  * we could have a grafana dashboard that shows us the state of the
 cluster. we already have the main dashboard which we can set to show only
 the ganeti cluster

 The current memory view goes about like this:

 [[Image(snap-2020.04.03-17.33.23.png,700)]]

 I'm not sure how we could improve this, but it seems to me having global
 (and/or per node?) memory, CPU, network and disk  usage would be a great
 improvement as well.

--

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33810#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list