Hi Karsten!
(Metrics and health team CCed)
Network team has been working on a new "MetricsPort"[1] in tor which can expose
counters of different metrics within "tor". It currently uses the Prometheus
model [2] which then allows us to create proper monitoring graphs using tools
like Grafana (see some example screenshots in #40063).
The short term goal here also is to provide Grafana templates for monitoring a
relay or onion service so people can just download them automatically from
their marketplace and are ready to go.
Then that made us think, what if we could have something similar on
metrics.torproject.org. A page that we could query like "/prometheus" that
would just give us a set of counters of the current state of the network. A
bit like a REST API but less "API-issh".
I do recall having seen at one point a REST API item on the metrics roadmap
but I'm not entirely sure about my memory hence why I'm probing you about
this.
Likely at first, what such a page would expose is not different from what
metrics has at the moment _but_ the difference is that it would allow anyone
(most importantly us) to be able to aggregate visualization in one dashboard
using latest visualization tech (Grafana for instance).
This kind of page can usually handle thousands of requests a second without
blinking so the load impact should be minimal since this is exposing an
already existing state to the world rather than querying a state (like I
assume Onionoo does?).
Maybe the solution here could be to instead write an "exporter" that queries
Onionoo and formats it nicely for a Prometheus server but I do fear the load
that it could put on Onionoo if let say A LOT of metrics are queried every 5
seconds or so?
The other thing is maybe the exporter idea is better, unclear, if we want to
be more agile at integrating other types of metrics like let say monitoring
the consensus like consensus-health does or extracting different data from
extra-info.
Thoughts?
Cheers!
David
[1] https://gitlab.torproject.org/tpo/core/tor/-/issues/40063
[2] https://prometheus.io/docs/concepts/data_model/
--
6/YYHbjbLDDZ9bG5utUI1Ts1W+SmKf0zyZc8rfr0FXY=