Here are the minutes from today's sysadmin meeting.
# Roll call: who's there and emergencies
Anarcat, Hiro, Linus, Peter, and Roger attending.
# What has everyone been up to
## anarcat
### July
* catchup with Stockholm and tasks * ipsec puppet module completion (should we publish it?) * fixed civicrm tunneling issues, hopefully ([#30912][]) * published blog post with updates from the previous email: https://anarc.at/blog/2019-07-30-pgp-flooding-attacks/ * struggled with administrative/accounting stuff * contacted greenhost about DNS: they have anycast DNS with an API, but not GeoDNS, what should we do? * RT access granting and audit ([#31249][], [#31248][]), various LDAP access tickets and cleaned up gettor group * [backup documentation][] ([#30880][]) * tested bacula and postgresq restore procedures, specifically, you might want to get familiar with those before a catastrophe * cleaned up services inventory ([#31261][]) all in https://trac.torproject.org/projects/tor/wiki/org/operations/services now * worked on getting ganeti into puppet with weasel
[#31261]: https://bugs.torproject.org/31261 [#30880]: https://bugs.torproject.org/30880 [backup documentation]: https://help.torproject.org/tsa/howto/backup/ [#31248]: https://bugs.torproject.org/31248 [#31249]: https://bugs.torproject.org/31249 [#30912]: https://bugs.torproject.org/30912
### August
* on vacation the last week, it was awesome * published a summary of the KNOB attack against Bluetooth (TL;DR: don't trust your BT keyboards) https://anarc.at/blog/2019-08-19-is-my-bluetooth-device-insecure/ * ganeti merge almost completed * first part of the hiera transition completed, yaaaaay! * tested a puppet validation hook ([#31226][]) you should install it locally, but our codebase is maybe not ready to run this server-side * retired labs.tpo ([#24956][]) * retired nova.tpo ([#29888][]) and updated the host retirement docs, especially the hairy procedure where we don't have remote console to wipe disks
[#29888]: https://bugs.torproject.org/29888 [#24956]: https://bugs.torproject.org/24956 [#31226]: https://bugs.torproject.org/31226
## hiro - Collecting all my snippets here https://dip.torproject.org/users/hiro/snippets
* catchup with Stockholm discussions and future tasks * fixed some prometheus puppet-fu * some website dev and maintenance * some blog fixes and updates * gitlab updates and migration planning * gettor service admin via ansible
## weasel, for september, actually * Finished doing ganeti stuff. We have at least one VM now, see next point * We have a loghost now, it's called loghost01. There is a /var/log/hosts that has logs per host, and some /var/log/*all* files that contain log lines from all the hosts. We don't do backups of this host's /var/log because it's big and all the data should be elsewhere anyway. * started doing new onionoo infra, see [#31659][]. * debian point releases
[#31659]: https://bugs.torproject.org/31659
# What we're up to next
## anarcat
* figure out the next steps in hiera refactoring ([#30020][]) * ops report card, see below ([#30881][]) * LDAP sudo transition plan ([#6367][]) * followup with snowflake + TPA? ([#31232][]) * send root@ emails to RT, and start using it more for more things? ([#31242][]) * followup with email services improvements ([#30608][]) * continue prometheus module merges * followup on SVN decomissionning ([#17202][])
[#17202]: https://bugs.torproject.org/17202 [#30608]: https://bugs.torproject.org/30608 [#31242]: https://bugs.torproject.org/31242 [#31232]: https://bugs.torproject.org/31232 [#6367]: https://bugs.torproject.org/6367 [#30881]: https://bugs.torproject.org/30881 [#30020]: https://bugs.torproject.org/30020
## hiro * on vacation first two weeks of August * followup and planning for search.tp.o * websites and gettor taks * more prometheus and puppet * review services documentation * monitor anti-censorship services * followup with gettor tasks * followup with greenhost
## weasel * want to restructure how we do web content distribution: * Right now, we rsync the static content to ~5-7 nodes that directly offer http to users and/or serve as backends for fastly. * The big number of rsync targets makes updating somewhat slow at times (since we want to switch to the new version atomicly). * I'd like to change that to ship all static content to 2, maybe 3, hosts. * These machines would not be accessed directly by users but would serve as backends for a) fastly, and b) our own varnish/haproxy frontends. * split onionoo backends (that run the java stuff) from frontends (that run haproxy/varnish). The backends might also want to run a varnish. Also, retire the stunnel and start doing ipsec between frontends and backends. (that's already started, cf. [#31659][]) * start moving VMs to gnt-fsn
## ln5
* help deciding things about a tor nextcloud instance * help getting such a tor nextcloud instance up and running * help migrating data from the nc instance at riseup into a tor instance * help migrating data from storm into a tor instance
# Answering the 'ops report card'
See https://trac.torproject.org/projects/tor/ticket/30881
anarcat introduced the project and gave a heads up that this might mean more ticket and organizational changes. for example, we don't define "what's an emergency" and "what's supported" clearly enough. anarcat will use this process as a prioritization tool as well.
# Email next steps
Brought up "the plan" to Vegas: https://trac.torproject.org/projects/tor/wiki/org/meetings/2019Stockholm/Notes/EmailNotEmail
Response was: why don't we just give everyone LDAP accounts? Everyone has PGP...
We're still uncomfortable with deploying the new email service but that was agreed upon in Stockholm. We don't see a problem with granting more people LDAP access, provided vegas or others can provide support and onboarding.
# Do we want to run Nextcloud?
See also the discussion in https://trac.torproject.org/projects/tor/ticket/31540
The alternatives:
A. Hosted on Tor Project infrastructure, operated by Tor Project.
B. Hosted on Tor Project infrastructure, operated by Riseup.
C. Hosted on Riseup infrastructure, operated by Riseup.
We're good with B or C for now. We can't give them root so B would need to be running as UID != 0, but they prefer to handle the machine themselves, so we'll go with C for now.
# Other discussions
weasel played with prom/grafana to diagnose onionoo stuff, and found interesting things. Wonders if we can hookup varnish, anarcat will investigate yet.
we don't want to keep storm running if we switch to nextcloud, make a plan.
# Next meeting
october 7th 1400UTC
# Metrics of the month
I figured I would bring back this tradition that Linus had going before I started doing the reports, but that I omitted because of lack of time and familiarity with the infrastructure. Now I'm a little more comfortable so I made a script in the wiki which polls numbers from various sources and makes a nice overview of what our infra looks like. Access and transfer rates are over the last 30 days.
* hosts in Puppet: 76, LDAP: 79, Prometheus exporters: 121 * number of apache servers monitored: 32, hits per second: 168 * number of self-hosted nameservers: 5, mail servers: 10 * pending upgrades: 0, reboots: 0 * average load: 0.56, memory available: 357.18 GiB/934.53 GiB, running processes: 441 * bytes sent: 126.79 MB/s, received: 96.13 MB/s
Those metrics should be taken with a grain of salt: many of those might not mean what you think they do, and some others might be gross mischaracterizations as well. I hope to improve those reports as time goes on.
Feedback, is, of course, very welcome.
tor-project@lists.torproject.org