Here's your monthly dose of sysadmin news!
# Roll call: who's there and emergencies
anarcat, gaba, kez, lavamind
# Dashboard review
We did our normal per-user check-in:
* https://gitlab.torproject.org/groups/tpo/-/boards?scope=all&utf8=%E2%9C%... * https://gitlab.torproject.org/groups/tpo/-/boards?scope=all&utf8=%E2%9C%... * https://gitlab.torproject.org/groups/tpo/-/boards?scope=all&utf8=%E2%9C%...
... and briefly reviewed the general dashboards:
* https://gitlab.torproject.org/tpo/tpa/team/-/boards/117 * https://gitlab.torproject.org/groups/tpo/web/-/boards * https://gitlab.torproject.org/groups/tpo/tpa/-/boards
We need to rethink the web board triage, as mentioned in the last point of this meeting.
# TPA-RFC-42: 2023 roadmap
Gaba brought up a few items we need to plan for, and schedule:
* donate page rewrite (kez) * sponsor9: * self-host discourse (Q1-Q2 < june 2023) * RT and cdr.link evaluation (Q1-Q2, gus): "improve our frontdesk tool by exploring the possibility of migrating to a better tool that can manage messaging apps with our users" * download page changes (kez? currently blocked on nico) * weblate transition (CI changes pending, lavamind following up) * developer portal (dev.torproject.org), in Hugo, from ura.design ([tpo/web/dev#6][])
Those are tasks that either TPA will need to do themselves or assist other people in. Gaba also went through the work planned for 2023 in general to see what would affect TPA.
We then discussed anarcat's roadmap proposal ([TPA-RFC-42][]):
* do the bookworm upgrades, this includes: * puppet server 7 * puppet agent 7 * plan would be: * Q1-Q2: deploy new machines with bookworm * Q1-Q4: upgrade existing machines to bookworm * email services migration (e.g. execute TPA-RFC-31, still need to decide the scope, proposal coming up) * possibly retire schleuder (e.g. execute TPA-RFC-41, currently waiting for feedback from the community council) * complete the cymru migration (e.g. execute TPA-RFC-40) * retire gitolite/gitweb (e.g. execute TPA-RFC-36) * retire SVN (e.g. execute TPA-RFC-11) * monitoring system overhaul (TPA-RFC-33) * deploy a Puppet CI * e.g. make the Puppet repo public, possibly by removing private content and just creating a "graft" to have a new repository without old history (as opposed to rewriting the entire history, because then we don't know if we have confidential stuff in the old history) * there are disagreements on whether or not we should make the repository public in the first place, as it's not exactly "state of the art" puppet code, which could be embarrassing * there's also a concern that we don't need CI as long as we don't have actual tests to run (but it's also kind of pointless to have CI without tests to run...), but for now we already have the objective of running linting checks on push ([tpo/tpa/team#31226][]) * plan for summer vacations
[tpo/web/dev#6]: https://gitlab.torproject.org/tpo/web/dev/-/issues/6 [TPA-RFC-42]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40924 [tpo/tpa/team#31226]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/31226
# Web team organisation
Postponed to next meeting. anarcat will join Gaba's next triage session with gus to see how that goes.
# Metrics of the month
* hosts in Puppet: 95, LDAP: 95, Prometheus exporters: 163 * number of Apache servers monitored: 29, hits per second: 715 * number of self-hosted nameservers: 6, mail servers: 10 * pending upgrades: 0, reboots: 4 * average load: 0.64, memory available: 4.61 TiB/5.74 TiB, running processes: 736 * disk free/total: 32.50 TiB/92.28 TiB * bytes sent: 363.66 MB/s, received: 215.11 MB/s * planned bullseye upgrades completion date: 2022-11-01 * [GitLab tickets][]: 175 tickets including... * open: 0 * icebox: 144 * backlog: 17 * next: 4 * doing: 7 * needs review: 1 * needs information: 2 * (closed: 2934)
[Gitlab tickets]: https://gitlab.torproject.org/tpo/tpa/team/-/boards
Upgrade prediction graph lives at:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/upgrades/bullseye/
Now also available as the main Grafana dashboard. Head to https://grafana.torproject.org/, change the time period to 30 days, and wait a while for results to render.
# Number of the month: 12
Progress on bullseye upgrades mostly flat-lined at 12 machines since August. We actually have three *less* bullseye servers now, down to 83 from 86.