minutes from the sysadmin meeting

Hello! Here are the minutes from this month's sysadmin meeting. # Roll call: who's there and emergencies anarcat, hiro, qbi present, ln5 and weasel couldn't make it but still sent updates. # What has everyone been up to ## anarcat * blog service damage control (#32090) * new caching service (#32239) * try to kick cymru back into life (#29397) * jabber service shutdown (#31700) * prometheus/ipsec reliability issues (#31916) * bumped prometheus retention to 5m/365d, bumped back to 1m/365d after i realized it broke the graphs (#31244) * LDAP sudo transition (#6367) * finished director replacement (#31786) * archived public SVN (#15948) * shutdown SVN internal (#15949) * fix "ping on new VMs" bug on ganeti hosts (#31781) * review Fastly contracts and contacts * became a blog maintainer (#23007) * clarified hardware donation policy in FAQ (#32044) * tracking major upgrades progress (fancy graphs!), visible at https://help.torproject.org/tsa/howto/upgrades/ - current est: april 2020 * joined a call with giant rabbit about finances, security and cost, hiro also talked with them about upgrading their CiviCRM, some downtimes to be announced soon-ish * massive (~20%) trac ticket cleanup in the "trac" component * worked sysadmin onboarding process docs (ticket #29395) * drafted a template for service documentation in https://help.torproject.org/tsa/howto/template/ * daily grind: email aliases, pgp key updates, full disks, security upgrades, reboots, performance problems ## hiro * website maintenance and eoy campaign * retire getulum * make a new machine for gettor * crm stuff with giant rabbit * some security updates and service documentation. Testing out ansible for scripts. Happy with the current setup used for gettor with everything else in puppet. * some gettor updates and maintenance * started creating the dev website * survey update * nagios gettor status check * dip updates and maintenance ## weasel * moving onionoo forward to new VMs (#31659 and linked) * moved more things off metal we want to get rid of * includes preparing a new IRC host (#32281); the old one is not yet gone ## qbi * created tor-moderators@ * updated some machines (apt uprade) ## linus * followed up with nextcloud launch # What we're up to next ## anarcat New: * caching server launch and followup, missing stats (#32239) Continued/stalled: * followup on SVN shutdown, only corp missing (#17202) * upstreaming ganeti installer fix and audit of the others (#31781) * followup with email services improvements (#30608) * followup on SVN decomissionning (#17202) * send root@ emails to RT (#31242) * continue prometheus module merges ## hiro * Lektor package upgrade * More website maintenance * nagios bridgedb status check * investigating occasional websites build failures * move translations / majus out of moly * finish prometheus tasks w/ anticensorship-team * why is gitlab giving an error when creating a MR from a forked repository? ## ln5 * nextcloud migration ## qbi * Upgrade some hosts (<5) to buster # Other discussions No planned discussion. # Next meeting qbi can't on dec 2nd and we missed two people this time, so it make sense to do it a week earlier... november 25th 1500UTC, which is 1600CET and 1000EST # Metrics of the month Access and transfer rates are an average over the last 30 days. * hosts in Puppet: 75, LDAP: 79, Prometheus exporters: 120 * number of apache servers monitored: 32, hits per second: 203 * number of self-hosted nameservers: 5, mail servers: 10 * pending upgrades: 5, reboots: 0 * average load: 0.94, memory available: 303.76 GiB/946.18 GiB, running processes: 387 * bytes sent: 200.05 MB/s, received: 132.90 MB/s Now also available as the main Grafana dashboard. Head to <https://grafana.torproject.org/>, change the time period to 30 days, and wait a while for results to render. -- Antoine Beaupré torproject.org system administration
participants (1)
-
Antoine Beaupré