Hi everyone,
Here's a (somewhat late) report from our monthly sysadmin meeting.
# Roll call: who's there and emergencies
anarcat, gaba, hiro
* hiro will be doing security reboots for [DSA-483][]
[DSA-483]: https://www.debian.org/security/2021/dsa-4843
# Dashboard review
We reviewed the [dashboard][] to prioritise the work in February.
[dashboard]: https://gitlab.torproject.org/tpo/tpa/team/-/boards
anarcat is doing triage for the next two weeks, as now indicated in the IRC channel topic.
# Communications discussion
We wanted to touch base on how we organise and communicate, but didn't have time to do so. Postponed to next meeting.
Reminder:
* Documentation about documentation: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/documentation * Policies: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy
# Next meeting
March 2nd, 2021, same time
# Metrics of the month
* hosts in Puppet: 83, LDAP: 86, Prometheus exporters: 135 * number of Apache servers monitored: 27, hits per second: 182 * number of Nginx servers: 2, hits per second: 3, hit ratio: 0.83 * number of self-hosted nameservers: 6, mail servers: 12 * pending upgrades: 11, reboots: 71 * average load: 0.41, memory available: 1.94 TiB/2.67 TiB, running processes: 520 * bytes sent: 281.62 MB/s, received: 163.47 MB/s * [GitLab tickets][]: 130 tickets including... * open: 0 * icebox: 96 * backlog: 18 * next: 10 * doing: 7 * (closed: 2182)
[Gitlab tickets]: https://gitlab.torproject.org/tpo/tpa/team/-/boards
I've been collecting those dashboard metrics for a while, and while I don't have pretty graphs to show you yet, I do have this fancy table:
| date | open | icebox | backlog | next | doing | closed | |------------|------|--------|---------|------|-------|--------| | 2020-07-01 | 125 | 0 | 26 | 13 | 7 | 2075 | | 2020-11-18 | 1 | 84 | 32 | 5 | 4 | 2119 | | 2020-12-02 | 0 | 92 | 20 | 9 | 8 | 2130 | | 2021-01-19 | 0 | 91 | 20 | 12 | 10 | 2165 | | 2021-02-02 | 0 | 96 | 18 | 10 | 7 | 2182 |
Some observations:
* the "Icebox" keeps piling up * we are closing tens and tens of tickets (about 20-30 a month) * we are getting better at keeping Backlog/Next/Doing small * triage is working: the "Open" queue is generally empty after the meeting
As usual, some of those stats are available in the main Grafana dashboard. Head to https://grafana.torproject.org/, change the time period to 30 days, and wait a while for results to render.