[tor-project] minutes from the sysadmin meeting

17 Mar 2021

      Hello!

Here's a (late, again) report from our monthly sysadmin meeting.

# Roll call: who's there and emergencies

 * anarcat
 * hiro
 * gaba

No emergencies.

# Roadmap review

Review and prioritize [the board][].

[the board]: https://gitlab.torproject.org/tpo/tpa/team/-/boards

 * CiviCRM setup discussion. gaba will look at plan
 * anarcat sent a formal proposal to the tor-internal mailing list
   with [jenkins retirement plan][] ([issue 40167][]), will be
   proposed to tor-internal soon
 * SMTP out only server is resuming in ~Next
 * Discourse situation: wait for a few months until hiro can take it
   back ([issue 40183][])

[issue 40183]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40183
[issue 40167]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40167
[jenkins retirement plan]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-10-jenkins...

# Documentation and communication

Are the current processes to document our work okay? Do we have
communication problems? Let's clarify expectations on how to manage
work and tickets.

## What is working

 * anarcat's docs work great, but could use a TL;DR (:+1:)
 * monthly meetings in voice calls
 * jump on a call when there is an issue or misunderstanding (:+1:)

## What can be improved

 * irc can be frustrating when communicating, jump on a voice call
   when necessary!
 * wiki is good for documentation, but not great to get feedback,
   because we don't want to delete other people's stuff and things get
   lost. better to use issues with comments for proposals.
 * hard time understanding what is going on some tickets, because of
   the lack of updates. We can write more comments in the tickets.
 * when triaging: if you assign to someone, then that person needs to
   know. when assigning an active queue (~Next or ~Doing), make sure
   the ticket is assigned.

# Triage

Is our current triage system working? How can others (AKA gaba)
prioritize our work?

Note that ahf is also working on triage, automation, more
specifically, through the [triage ops][] project.

[triage ops]: https://gitlab.torproject.org/ahf/triage-ops/

We might want to include the [broader TPA dashboard][] eventually, but
this requires serious triage work first.

[broader TPA dashboard]: https://gitlab.torproject.org/groups/tpo/tpa/-/boards

Discussion postponed.

# On call

Which services/issues we can call TPA about when nobody is working?

Review and discuss the [current support policy][], which is basically
"none, things may be down until we return"...

Discussion postponed.

[current support policy]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-2-support#...

# Other discussions

## Anonymous ticket system

Postponed.

# Next meeting

April 6th, 15:00UTC, equivalent to: 08:00 US/Pacific, 12:00
America/Montevideo, 11:00 US/Eastern, 17:00 Europe/Paris.

# Metrics of the month

 * hosts in Puppet: 85, LDAP: 88, Prometheus exporters: 139
 * number of Apache servers monitored: 28, hits per second: 50
 * number of Nginx servers: 2, hits per second: 2, hit ratio: 0.87
 * number of self-hosted nameservers: 6, mail servers: 7
 * pending upgrades: 4, reboots: 0
 * average load: 0.93, memory available: 1.98 TiB/2.73 TiB, running
   processes: 627
 * bytes sent: 267.74 MB/s, received: 160.59 MB/s
 * [GitLab tickets][]: ? tickets including...
   * open: 0
   * icebox: 107
   * backlog: 15
   * next: 9
   * doing: 7
   * (closed: 2213)

 [Gitlab tickets]: https://gitlab.torproject.org/tpo/tpa/team/-/boards

# Grafana dashboards of the month

The [Postfix dashboard][] was entirely rebuilt and now has accurate
"acceptance ratios" per host. It was used to manage the latest
newsletter mailings. We still don't have great ratios, but at least
now we know.

[Postfix dashboard]: https://grafana.torproject.org/d/Ds5BxBYGk/postfix-mtail

The [GitLab dashboard][] now has a "CI jobs" panel which shows the
number of queued and running jobs, which should help you figure out
when your precious CI job will get through!

[GitLab dashboard]: https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1

-- 
Antoine Beaupré
torproject.org system administration