[tor-project] minutes from the sysadmin meeting
anarcat at torproject.org
Wed Jun 10 20:02:03 UTC 2020
# Roll call: who's there and emergencies
Present: anarcat, hiro, weasel.
Small emergency with Gitlab.
We realized that the GitLab backups were not functionning properly
because GitLab omnibus runs its own database server, separate from the
one ran by TPA. In the long term, we want to fix this, but in the
short term, the following should be done:
1. that it works without filling up the disk ;) (probably just a matter of rotating the backups)
2. that it backs up *everything* (including secrets)
3. that it stores the backup files *offsite* (maybe using bacula)
4. that it is documented
The following actions were undertaken:
* make new (rotating disk) volume to store backups, mount it some
place (weasel; done)
* tell bacula to ignore the rest of gitlab /var/opt/.nobackup in
puppet (hiro; done)
* make the (rotating) cronjob in puppet, including the secrets in
./gitlab-rails/etc (hiro, anarcat; done)
* document ALL THE THINGS (anarcat) - specifically in a new page
somewhere under [tsa/howto/backup], along with more generic
gitlab documentation ()
# Roadmap review
We proceeded with a review of the [May and June roadmap].
[May and June roadmap]: https://trac.torproject.org/projects/tor/wiki/org/teams/SysadminTeam#May
We note that this roadmap system will go away after the gitlab
migration, after which point we will experiment with various gitlab
tools (most notably the "Boards" feature) to organize work.
alex will ask hiro or weasel to put trac offline, we keep filing
tickets in Trac until then.
weasel has taken on the kvm/ganeti migration:
hiro will try creating the next ganeti node to get experience on that
anarcat should work on documentation, examples:
* how to add a disk on a ganeti node (done)
* [LDAP / ud-ldap]
[LDAP / ud-ldap]: https://trac.torproject.org/projects/tor/ticket/34426
# Availability planning
We are thinking of setting up an alternating schedule where hiro would
be available Monday to Wednesday and anarcat from Wednesday to Friday,
but we're unsure this will be possible. We might just do it on a week
by week basis instead.
We also note that anarcat will become fully unavailable for two months
starting anywhere between now and mid-july, which deeply affects the
roadmap above. Mainly, anarcat will focus on documentation and avoid
# Other discussions
We discussed TPA-RFC-2, "support policy"
([tsa/policy/tpa-rfc-2-support]), during the meeting, because
someone asked if they could contact us over signal (the answer is
The policy seemed to be consistent with what people in the meeting
expected and it will be sent for approval to tor-internal shortly.
# Next meeting
TBD. First wednesday in July is a bank holiday in Canada so it's not a
# Metrics of the month
* hosts in Puppet: 74, LDAP: 77, Prometheus exporters: 128
* number of apache servers monitored: 29, hits per second: 163
* number of nginx servers: 2, hits per second: 2, hit ratio: 0.88
* number of self-hosted nameservers: 6, mail servers: 12
* pending upgrades: 35, reboots: 48
* average load: 0.55, memory available: 346.14 GiB/952.95 GiB, running processes: 428
* bytes sent: 207.17 MB/s, received: 111.78 MB/s
* planned buster upgrades completion date: 2020-08-18
Upgrade prediction graph still lives at
Now also available as the main Grafana dashboard. Head to
<https://grafana.torproject.org/>, change the time period to 30 days,
and wait a while for results to render.
torproject.org system administration
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 487 bytes
Desc: not available
More information about the tor-project