[tor-project] minutes from the sysadmin meeting
anarcat at torproject.org
Mon Sep 9 20:34:45 UTC 2019
Here are the minutes from today's sysadmin meeting.
# Roll call: who's there and emergencies
Anarcat, Hiro, Linus, Peter, and Roger attending.
# What has everyone been up to
* catchup with Stockholm and tasks
* ipsec puppet module completion (should we publish it?)
* fixed civicrm tunneling issues, hopefully ([#30912])
* published blog post with updates from the previous email:
* struggled with administrative/accounting stuff
* contacted greenhost about DNS: they have anycast DNS with an API,
but not GeoDNS, what should we do?
* RT access granting and audit ([#31249], [#31248]), various LDAP
access tickets and cleaned up gettor group
* [backup documentation] ([#30880])
* tested bacula and postgresq restore procedures, specifically, you
might want to get familiar with those before a catastrophe
* cleaned up services inventory ([#31261]) all in
* worked on getting ganeti into puppet with weasel
[backup documentation]: https://help.torproject.org/tsa/howto/backup/
* on vacation the last week, it was awesome
* published a summary of the KNOB attack against Bluetooth (TL;DR:
don't trust your BT keyboards)
* ganeti merge almost completed
* first part of the hiera transition completed, yaaaaay!
* tested a puppet validation hook ([#31226]) you should install it
locally, but our codebase is maybe not ready to run this
* retired labs.tpo ([#24956])
* retired nova.tpo ([#29888]) and updated the host retirement docs,
especially the hairy procedure where we don't have remote console
to wipe disks
## hiro - Collecting all my snippets here https://dip.torproject.org/users/hiro/snippets
* catchup with Stockholm discussions and future tasks
* fixed some prometheus puppet-fu
* some website dev and maintenance
* some blog fixes and updates
* gitlab updates and migration planning
* gettor service admin via ansible
## weasel, for september, actually
* Finished doing ganeti stuff. We have at least one VM now, see next
* We have a loghost now, it's called loghost01. There is a
/var/log/hosts that has logs per host, and some /var/log/*all*
files that contain log lines from all the hosts. We don't do
backups of this host's /var/log because it's big and all the data
should be elsewhere anyway.
* started doing new onionoo infra, see [#31659].
* debian point releases
# What we're up to next
* figure out the next steps in hiera refactoring ([#30020])
* ops report card, see below ([#30881])
* LDAP sudo transition plan ([#6367])
* followup with snowflake + TPA? ([#31232])
* send root@ emails to RT, and start using it more for more things?
* followup with email services improvements ([#30608])
* continue prometheus module merges
* followup on SVN decomissionning ([#17202])
* on vacation first two weeks of August
* followup and planning for search.tp.o
* websites and gettor taks
* more prometheus and puppet
* review services documentation
* monitor anti-censorship services
* followup with gettor tasks
* followup with greenhost
* want to restructure how we do web content distribution:
* Right now, we rsync the static content to ~5-7 nodes that
directly offer http to users and/or serve as backends for fastly.
* The big number of rsync targets makes updating somewhat slow at
times (since we want to switch to the new version atomicly).
* I'd like to change that to ship all static content to 2, maybe 3,
* These machines would not be accessed directly by users but would
serve as backends for a) fastly, and b) our own varnish/haproxy
* split onionoo backends (that run the java stuff) from frontends
(that run haproxy/varnish). The backends might also want to run a
varnish. Also, retire the stunnel and start doing ipsec between
frontends and backends. (that's already started, cf. [#31659])
* start moving VMs to gnt-fsn
* help deciding things about a tor nextcloud instance
* help getting such a tor nextcloud instance up and running
* help migrating data from the nc instance at riseup into a tor
* help migrating data from storm into a tor instance
# Answering the 'ops report card'
anarcat introduced the project and gave a heads up that this might
mean more ticket and organizational changes. for example, we don't
define "what's an emergency" and "what's supported" clearly
enough. anarcat will use this process as a prioritization tool as
# Email next steps
Brought up "the plan" to Vegas: <https://trac.torproject.org/projects/tor/wiki/org/meetings/2019Stockholm/Notes/EmailNotEmail>
Response was: why don't we just give everyone LDAP accounts? Everyone
We're still uncomfortable with deploying the new email service but
that was agreed upon in Stockholm. We don't see a problem with
granting more people LDAP access, provided vegas or others can provide
support and onboarding.
# Do we want to run Nextcloud?
See also the discussion in <https://trac.torproject.org/projects/tor/ticket/31540>
A. Hosted on Tor Project infrastructure, operated by Tor Project.
B. Hosted on Tor Project infrastructure, operated by Riseup.
C. Hosted on Riseup infrastructure, operated by Riseup.
We're good with B or C for now. We can't give them root so B would
need to be running as UID != 0, but they prefer to handle the machine
themselves, so we'll go with C for now.
# Other discussions
weasel played with prom/grafana to diagnose onionoo stuff, and found
interesting things. Wonders if we can hookup varnish, anarcat will
we don't want to keep storm running if we switch to nextcloud, make a
# Next meeting
october 7th 1400UTC
# Metrics of the month
I figured I would bring back this tradition that Linus had going before
I started doing the reports, but that I omitted because of lack of time
and familiarity with the infrastructure. Now I'm a little more
comfortable so I made a script in the wiki which polls numbers from
various sources and makes a nice overview of what our infra looks
like. Access and transfer rates are over the last 30 days.
* hosts in Puppet: 76, LDAP: 79, Prometheus exporters: 121
* number of apache servers monitored: 32, hits per second: 168
* number of self-hosted nameservers: 5, mail servers: 10
* pending upgrades: 0, reboots: 0
* average load: 0.56, memory available: 357.18 GiB/934.53 GiB, running processes: 441
* bytes sent: 126.79 MB/s, received: 96.13 MB/s
Those metrics should be taken with a grain of salt: many of those might
not mean what you think they do, and some others might be gross
mischaracterizations as well. I hope to improve those reports as time
Feedback, is, of course, very welcome.
torproject.org system administration
More information about the tor-project