Hello!
Here's a (late, again) report from our monthly sysadmin meeting.
# Roll call: who's there and emergencies
* anarcat * hiro * gaba
No emergencies.
# Roadmap review
Review and prioritize [the board][].
[the board]: https://gitlab.torproject.org/tpo/tpa/team/-/boards
* CiviCRM setup discussion. gaba will look at plan * anarcat sent a formal proposal to the tor-internal mailing list with [jenkins retirement plan][] ([issue 40167][]), will be proposed to tor-internal soon * SMTP out only server is resuming in ~Next * Discourse situation: wait for a few months until hiro can take it back ([issue 40183][])
[issue 40183]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40183 [issue 40167]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40167 [jenkins retirement plan]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-10-jenkins...
# Documentation and communication
Are the current processes to document our work okay? Do we have communication problems? Let's clarify expectations on how to manage work and tickets.
## What is working
* anarcat's docs work great, but could use a TL;DR (:+1:) * monthly meetings in voice calls * jump on a call when there is an issue or misunderstanding (:+1:)
## What can be improved
* irc can be frustrating when communicating, jump on a voice call when necessary! * wiki is good for documentation, but not great to get feedback, because we don't want to delete other people's stuff and things get lost. better to use issues with comments for proposals. * hard time understanding what is going on some tickets, because of the lack of updates. We can write more comments in the tickets. * when triaging: if you assign to someone, then that person needs to know. when assigning an active queue (~Next or ~Doing), make sure the ticket is assigned.
# Triage
Is our current triage system working? How can others (AKA gaba) prioritize our work?
Note that ahf is also working on triage, automation, more specifically, through the [triage ops][] project.
[triage ops]: https://gitlab.torproject.org/ahf/triage-ops/
We might want to include the [broader TPA dashboard][] eventually, but this requires serious triage work first.
[broader TPA dashboard]: https://gitlab.torproject.org/groups/tpo/tpa/-/boards
Discussion postponed.
# On call
Which services/issues we can call TPA about when nobody is working?
Review and discuss the [current support policy][], which is basically "none, things may be down until we return"...
Discussion postponed.
[current support policy]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-2-support#...
# Other discussions
## Anonymous ticket system
Postponed.
# Next meeting
April 6th, 15:00UTC, equivalent to: 08:00 US/Pacific, 12:00 America/Montevideo, 11:00 US/Eastern, 17:00 Europe/Paris.
# Metrics of the month
* hosts in Puppet: 85, LDAP: 88, Prometheus exporters: 139 * number of Apache servers monitored: 28, hits per second: 50 * number of Nginx servers: 2, hits per second: 2, hit ratio: 0.87 * number of self-hosted nameservers: 6, mail servers: 7 * pending upgrades: 4, reboots: 0 * average load: 0.93, memory available: 1.98 TiB/2.73 TiB, running processes: 627 * bytes sent: 267.74 MB/s, received: 160.59 MB/s * [GitLab tickets][]: ? tickets including... * open: 0 * icebox: 107 * backlog: 15 * next: 9 * doing: 7 * (closed: 2213)
[Gitlab tickets]: https://gitlab.torproject.org/tpo/tpa/team/-/boards
# Grafana dashboards of the month
The [Postfix dashboard][] was entirely rebuilt and now has accurate "acceptance ratios" per host. It was used to manage the latest newsletter mailings. We still don't have great ratios, but at least now we know.
[Postfix dashboard]: https://grafana.torproject.org/d/Ds5BxBYGk/postfix-mtail
The [GitLab dashboard][] now has a "CI jobs" panel which shows the number of queued and running jobs, which should help you figure out when your precious CI job will get through!
[GitLab dashboard]: https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1