[tor-project] minutes from the sysadmin meeting
anarcat at torproject.org
Tue Nov 26 17:11:37 UTC 2019
Here are the minutes from the last sysadmin meeting.
# Roll call: who's there and emergencies
anarcat, gaba, hiro present, weasel and linus couldn't make it, no
news from qbi.
# What has everyone been up to
* followup with cymru ([#29397])
* OONI.tpo now moved out of TPO infrastructure (hosted at netlify)
and closed some related accounts ([#31718]) - implied documenting
how to retire a static component
* identified that we need to work on onboarding/offboarding
procedures ([#32519]) and especially "what happens to email when
people leave" ([#32558])
* new caching service tweaks, now 88% hit ratio, will hopefully go
down to 300$/mth costs in november! see the [shiny graphs]
* worked more on Nginx status dashboards to ensure we have good
response latency and rates in the caching system
* reconfirmed mailing list problems as related to DMARC, can we fix
this now? ([#29770])
* wrote a Postfix mail log parser (in lnav) to diagnose email issues
in the mail server
* helped with the deployment of a ZNC bouncer for IRC users
([#32532]) along with fixes to the "mosh" configuration
* getting started on the [new email service project], reconfirmed
the "Goals" section with vegas
* lots of work on puppet cleanup and refactoring
* NMU'd upstream ganeti installer fix, proposed stable update
* build-arm-* box retirement and ipsec config cleanup
* fixed prometheus/ipsec reliability issues ([#31916], it was
[shiny graphs]: https://grafana.torproject.org/d/p21-cvJWk/cache-health
[new email service project]: https://help.torproject.org/tsa/howto/submission/
* Some work on donate.tpo with giant rabbit
* Updates and debug on dip.tp.o
* Security updates and reboots
* Work on the websites
* Git maintenance
* Decommissioning Getulum
* Started running the website meeting and coordinating dev portal for
Some coordination work around Nextcloud.
Nothing to report.
# What we're up to next
* varnish -> nginx conversion? ([#32462])
* review cipher suites? ([#32351])
* release our custom installer for public review? ([#31239])
* publish our puppet source code ([#29387])
* followup on SVN shutdown, only corp missing ([#17202])
* audit of the other installers for ping/ACL issue ([#31781])
* followup with email services improvements ([#30608])
* send root@ emails to RT ([#31242])
* continue prometheus module merges
* Clean up websites bugs
* needrestart automation ([#31957])
* CRM upgrades coordination for january? ([#32198])
* translation move ([#31784])
Will try to followup with Nextcloud again.
Nothing to report.
# Winter holidays
Who's online when in December? Can we look at continuity during that
hiro will be online during the holidays. anarcat will be moderately
online until january, but will take a week offline some time early
january. to be clarified.
Need to clarify how much support we provide, see [#31243] for the
# prometheus server resize
Can i double the size of the prometheus server to cover for extra disk
space? See [#31244] for the larger project.
Will rise the cost from 4.90EUR to 8.90EUR. Everyone is go on this,
anarcat updated the budget to reflect the new expense.
# Other discussions
Blog status? Anarcat got a quote back and will bring it up at the next
# Next meeting
Unclear. jan 6th is a holiday in europe ("the day of the kings"), so
we might postpone until january 13th. we are considering having
shorter, weekly meetings.
# Metrics of the month
* hosts in Puppet: 76, LDAP: 79, Prometheus exporters: 123
* number of apache servers monitored: 32, hits per second: 195
* number of nginx servers: 109, hits per second: 1, hit ratio: 0.88
* number of self-hosted nameservers: 5, mail servers: 10
* pending upgrades: 0, reboots: 0
* average load: 0.62, memory available: 334.59 GiB/957.91 GiB, running
* bytes sent: 176.80 MB/s, received: 118.35 MB/s
* planned buster upgrades completion date: 2020-05-01
Now also available as the main Grafana dashboard. Head to
<https://grafana.torproject.org/>, change the time period to 30 days,
and wait a while for results to render.
The Nginx cache ratio stats are not (yet?) in the main
dashboard. Upgrade prediction graph still lives at
<https://help.torproject.org/tsa/howto/upgrades/> but the [prediction
script] has been rewritten and moved to GitLab.
[prediction script]: https://gitlab.com/anarcat/predict-os
torproject.org system administration
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 487 bytes
Desc: not available
More information about the tor-project