[tor-bugs] #29684 [Internal Services/Tor Sysadmin Team]: setup a grafana server somewhere

Thu Mar 21 13:34:34 UTC 2019

#29684: setup a grafana server somewhere
-------------------------------------------------+-------------------------
 Reporter:  anarcat                              |          Owner:  anarcat
     Type:  defect                               |         Status:
                                                 |  assigned
 Priority:  Medium                               |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:                                       |  Actual Points:
Parent ID:  #29681                               |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by anarcat):

 the first step here, to be clear, is a choice between the following
 options:

  1. Grafana installed with the upstream Debian package, no isolation
 (current situation)
  2. Grafana installed with the upstream Debian package, in its own VM
  3. Grafana installed with the upstream Docker image
  4. Something else than Grafana, but still using Prometheus
  5. Going back to Munin

 TL;DR: I'm for '''option 1''' for now and eventually '''option 3''' if
 upstream can't figure out Debian packaging. I need a decision on this to
 move forward with the munin-node cleanup and Grafana configuration, but
 I'll continue the deployment of Prometheus exporters everywhere in any
 case (unless people feel strongly for '''option 5''').

 Taking those in reverse order:

 I don't think anyone is seriously considering '''option 5''' here, but I
 just added it to make things clear.

 I am somewhat opposed to '''option 4''': I don't know of any good
 replacement for Prometheus that is better packaged in Debian and will
 allow us to graph metrics from Prometheus the way we need. We *can* build
 custom graphs and dashboards using the
 [https://prometheus.io/docs/visualization/consoles/ console templates] but
 my experience with Prometheus graphs so far has been painful at best. They
 are hard to make and hard to share, while there is already a library of
 Grafana dashboards we can draw from (even if a little small).

 Regarding '''option 3''', I don't care that much about Debian vs Docker. I
 originally wanted to try Docker images because I didn't feel comfortable
 installing arbitrary upstream code as root in our infrastructure. I also
 liked the idea to get the little extra isolation Docker provides, from
 that non-vetted upstream code, even if it means a few extra layers of
 abstractions and weird tradeoffs. But (understandably) ln5 wasn't
 comfortable using containers altogether and I figured it might be simpler
 to just use a Debian package for now, since it's something we're all
 familiar with.

 ('''Option 2''') So that's why we're running the upstream Debian package
 now, without isolation - that is, in the same VM as the Prometheus server.
 As discussed with ln5 over IRC, the catastrophic scenario that we would
 avoid by setting up Grafana in a separate VM is that someone takes over
 the Grafana server, and use that to start attacking other nodes in the
 network running the Prometheus exporters. They would need to hack
 ''those'' '''and''' also escape ''their'' sandboxes to do any more
 significant damage to other nodes. Another attack vector is getting to the
 Prometheus data itself, but that is currently protected by a "invite"
 password so it's not really that much of a concern. If an attacker could
 get privilege escalation and access to the Prometheus accounts, they might
 be able to silence alarms and inject arbitrary data in the Prometheus
 database, that said.

 Setting up a separate VM for Grafana would mean that the Grafana server
 wouldn't talk to Prometheus locally anymore, which could have performance
 impact over the graph generation time. We *could* host the two VMs on the
 same physical box, but that would require rebuilding the Prometheus server
 as well.

 So I don't think the tradeoffs of running Grafana in a separate VM is
 worth it.

 I would continue with the current Debian-based setup ('''option 1''') or,
 if we're worried about trusting those packages, switch to the Docker image
 ('''option 3'''). In any case, I would prefer if we could continue the
 implementation to be on par with what we get with Munin out of the box,
 which involves adding a few more exporters to get stats about databases
 and webservers.

 This is all Prometheus stuff and so far I haven't seen resistance to that
 technology, so from now on I'll go under the assertion that I can continue
 deploying those exporters, which are well packaged in Debian and easier to
 deploy anyways, with minimal dependencies.

 The open question for me is whether I should tear out the traces of Munin
 configuration on the hosts. There are still munin-node daemons running
 everywhere and failing cronjobs doing noises. By removing that stuff, I
 would also see what's there that's missing from our Prometheus setup which
 would be useful in itself.

 The other question is if we go with Grafana at all or find "something
 else" ('''option 4'''). I'd like to keep going with Grafana and finish its
 configuration, naturally, but I'm open to alternative suggestions of
 course.

 Alright, sorry for the long email, but I figured it was worth documenting
 all the options carefully.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29684#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online