
An error crept up in the Metrics of this month and last, see if you can spot it: On 2022-01-11 20:34:08, Antoine Beaupré wrote:
# Metrics of the month
* hosts in Puppet: 89, LDAP: 91, Prometheus exporters: 139 * number of Apache servers monitored: 27, hits per second: 185 * number of Nginx servers: 0, hits per second: 0, hit ratio: 0.00 * number of self-hosted nameservers: 6, mail servers: 8 * pending upgrades: 7, reboots: 0 * average load: 0.35, memory available: 4.01 TiB/5.13 TiB, running processes: 643 * disk free/total: 84.95 TiB/39.99 TiB * bytes sent: 325.45 MB/s, received: 190.66 MB/s * planned bullseye upgrades completion date: 2024-09-07 * [GitLab tickets][]: 159 tickets including... * open: 2 * icebox: 143 * backlog: 8 * next: 2 * doing: 2 * needs information: 2 * (closed: 2573)
[Gitlab tickets]: https://gitlab.torproject.org/tpo/tpa/team/-/boards
hint: it's about disk space... anyone? credits to roger who figured it out: the disk free/total was backwards. The correct figure should have read: * disk free/total: 38.28 TiB/84.95 TiB ... in this report. Future report shouldn't have this error. It should also be noted that those metrics should be generally taken with a grain of salt. The disk query was introduced recently and, in particular, counts disk usage of the (huge) backup server (60TiB) which itself keeps a copy of everything by definition. The network metrics also probably overcount things as we simply do this: sum(rate(node_network_transmit_bytes_total[30d])) ... which, in the most likely case you are unfamiliar with Prometheus and our network infrastructure, may count traffic twice. This will count internal traffic between network mirrors, for example. I haven't yet figured out a good (AKA simple) way to fix those queries... Cheers! A. -- Antoine Beaupré torproject.org system administration