[tor-project] minutes from the sysadmin meeting
anarcat at torproject.org
Thu Dec 3 19:25:25 UTC 2020
I forgot to add a fancy header like this like month, but I want to say
"hi!" to everyone, and "welcome back to our monthly reports from the
sysadmin team"! :)
Hopefully everyone can manage to stay safe in this crazier-than-usual
- Roll call: who's there and emergencies
- Roadmap review
- Triage rotation
- Holiday planning
- TPA survey review
- Other discussions
- New intern
- Next meeting
- Metrics of the month
# Roll call: who's there and emergencies
anarcat, hiro, gaba, no emergencies
The meeting took place on IRC because anarcat had too much noise.
# Roadmap review
Did a lot of cleanup in the dashboard:
In general, the following items were priotirized:
* [GitLab CI]
* finish setting up the Cymru network, especially the [VPN]
* [tor browser build boxes]
* small tickets like the [git stuff] and triage (see below)
[git stuff]: https://gitlab.torproject.org/tpo/tpa/team/-/boards?&label_name=Git
[tor browser build boxes]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/34122
[GitLab CI]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40095
The following items were punted to the future:
* SVN retirement (to January)
* password management (specs in January?)
* Puppet role account and verifications
We briefly discussed Grafana authentication, because of a request to
[create a new account on grafana2]. anarcat said the current model
of managing the htpasswd file in Puppet doesn't scale so well because
we need to go through this process every time we need to grant access
(or do a password reset) and identified 3 alternative authentication
[create a new account on grafana2]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40102
1. htpasswd managed in Puppet (status quo)
2. Grafana users (disabling the htpasswd, basically)
3. LDAP authentication
The current authentication model was picked because we wanted to
automate user creation in Puppet, and because it's hard to create
users in Grafana from Puppet. When a new Grafana server is setup,
there's a small window during which an attacker could create an admin
account, which we were trying to counter. But maybe those concerns are
We also discussed passord management but that will be worked on in
January. We'll try to set a roadmap for 2021 in January, after the
results of the survey have come in.
# Triage rotation
Hiro brought up the idea of rotating the triage work instead of having
always the same person doing it. Right now, anarcat looks at the board
at the beginning of every week and deals with tickets in the "Open"
column. Often, he just takes the easy tickets, drops them in ~Next,
and just does them, other times, they end up in ~Backlog or get closed
or at least have some response of some sort.
We agreed to switch that responsability every two weeks
# Holiday planning
anarcat off from 14th to the 26th, hiro from 30th to jan 14th
# TPA survey review
anarcat is [working on a survey] to get information from our users
to plan the 2021 roadmap.
[working on a survey]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40061
People like the survey in general, but the "services" questions were
just too long. It was suggested to remove services TPA has nothing to
do with (like websites or metrics stuff like check.tpo). But anarcat
pointed out that we need to know which of those services are
important: for example right now we "just know" that check.tpo is
important, but it would be nice to have hard data that confirms it.
Anarcat agreed to separate the table into teams so that it doesn't
look that long and will submit the survey back for review again by the
end of the week.
# Other discussions
## New intern
[MariaV] just started as an Outreachy intern to work on Anonymous
Ticket System. She may be joining the `#tpo-admin` channel and may
join the gitlab/tooling meetings.
# Next meeting
Quick check-in on December 29th, same time.
# Metrics of the month
* hosts in Puppet: 79, LDAP: 82, Prometheus exporters: 133
* number of apache servers monitored: 28, hits per second: 205
* number of nginx servers: 2, hits per second: 3, hit ratio: 0.86
* number of self-hosted nameservers: 6, mail servers: 12
* pending upgrades: 1, reboots: 0
* average load: 0.34, memory available: 1.80 TiB/2.39 TiB, running
* bytes sent: 245.34 MB/s, received: 139.99 MB/s
* [GitLab tickets]: 129 issues including...
* open: 0
* icebox: 92
* backlog: 20
* next: 9
* doing: 8
* (closed: 2130)
[Gitlab tickets]: https://gitlab.torproject.org/tpo/tpa/team/-/boards
The upgrade prediction graph has been retired since it keeps
predicting the upgrades will be finished in the past, which no one
seems to have noticed from the last report (including me).
Metrics also available as the main Grafana dashboard. Head to
<https://grafana.torproject.org/>, change the time period to 30 days,
and wait a while for results to render.
torproject.org system administration
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 487 bytes
Desc: not available
More information about the tor-project