[tor-project] minutes from the sysadmin meeting

Antoine Beaupré anarcat at torproject.org
Wed Jan 20 19:02:06 UTC 2021


Hi!

It feels so strange to say that this year around, but... happy new year
everyone! Let's hope we can do better this time around. ;)

Here's your first sysadmin report for 2021, hopefully we'll keep you
informed of our progress steadily in the coming year. Right now we're
working on the roadmap and, even though we asked you for feedback in the
user survey, it's still time to steer us in the good direction. We have
a meeting coming up where we're likely to set that more in stone, so now
is a good time if you forgot to respond to the survey...

Now onto the minutes.

Agenda:

- Roll call: who's there and emergencies
- Dashboard review
- Roadmap 2021 proposal
    - 2020 retrospective
    - Services survey
    - Goals for 2021
- Other discussions
- Next meeting
- Metrics of the month

# Roll call: who's there and emergencies

present: hiro, gaba, anarcat

[GitLab backups are broken][]: it might need more disk space than we
need. just bump disk space in the short term, consider changing the
backups system, in the long term.

[GitLab backups are broken]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40143

# Dashboard review

We [reviewed the dashboard][], too much stuff in January, but we'll
review in February.

[reviewed the dashboard]: https://gitlab.torproject.org/tpo/tpa/team/-/boards

# Roadmap 2021 proposal

We discussed the [roadmap project][] anarcat worked on. We reviewed
the 2020 retrospective, talked about the services survey, and
discussed goals for 2021.

[roadmap project]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/roadmap/2021

## 2020 retrospective

We reviewed and discussed the [2020 roadmap evaluation][] that anarcat
prepared:

[2020 roadmap evaluation]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/roadmap/2021#2020-roadmap-evaluation

 * **what worked?** we did the "need to have" even through the
   apocalypse, staff reduction and all the craziness of 2020! success!
 * **what was a challenge?**
   * monthly tracking was not practical, and hard to do in
     Trac. things are a lot easier with GitLab's dashboard.
   * it was hard to work through the pandemic.
 * **what can we change?**
   * do quarterly-based planning
   * estimates were off because so many things happened that we did
     not expect. reserve time for the unexpected, reduce expectations.
   * ticket triage is rotated now.

## Services survey

We discussed the [survey results analysis][] briefly, and how it is
used as a basis for the roadmap brainstorm. The two major services
people use are GitLab and email, and those will be the focus of the
roadmap for the coming year.

[survey results analysis]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/roadmap/2021#survey-results

## Goals for 2021

 * email services stabilisation ("submission server", "my email end up
   in spam", CiviCRM bounce handling, etc) - consider [outsourcing
   email services][]
 * gitlab migration continues (Jenkins, gitolite)
 * simplify / improve puppet code base
 * stabilise services (e.g. gitlab, schleuder)

[outsourcing email services]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/submission#cost

Next steps for the roadmap:

 * try to make estimates
 * add need to have, nice to have
 * anarcat will work on a draft based on the brainstorm
 * we meet again in one week to discuss it

# Other discussions

Postponed: metrics services to maintain until we hire new person

# Next meeting

Same time, next week.

# Metrics of the month

Fun fact: we crossed the 2TiB total available memory back in November
2020, almost double from the previous report (in July), even with the
number of hosts in Puppet remained mostly constant (78 vs 72). This is
due (among other things) to the new Cymru Ganeti cluster that added a
whopping 1.2TiB of memory to our infrastructure!

 * hosts in Puppet: 82, LDAP: 85, Prometheus exporters: 134
 * number of Apache servers monitored: 27, hits per second: 198
 * number of Nginx servers: 2, hits per second: 3, hit ratio: 0.86
 * number of self-hosted nameservers: 6, mail servers: 12
 * pending upgrades: 3, reboots: 0
 * average load: 0.29, memory available: 2.00 TiB/2.61 TiB, running
   processes: 512
 * bytes sent: 265.07 MB/s, received: 155.20 MB/s
 * [GitLab tickets][]: 113 tickets including...
   * open: 0
   * icebox: 91
   * backlog: 20
   * next: 12
   * doing: 10
   * (closed: 2165)

 [Gitlab tickets]: https://gitlab.torproject.org/tpo/tpa/team/-/boards

Now also available as the main Grafana dashboard. Head to
<https://grafana.torproject.org/>, change the time period to 30 days,
and wait a while for results to render.

-- 
Antoine Beaupré
torproject.org system administration


More information about the tor-project mailing list