[tor-bugs] #34185 [Internal Services/Tor Sysadmin Team]: ganeti clusters don't like automatic upgrades
Tor Bug Tracker & Wiki
blackhole at torproject.org
Mon May 11 20:40:59 UTC 2020
#34185: ganeti clusters don't like automatic upgrades
-------------------------------------------------+-------------------------
Reporter: anarcat | Owner: hiro
Type: defect | Status:
| assigned
Priority: High | Milestone:
Component: Internal Services/Tor Sysadmin Team | Version:
Severity: Major | Resolution:
Keywords: tpa-roadmap-may | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by anarcat):
This is the mail I sent on sunday:
> There was a ~8h ganeti outage until about now. It seems the buster point
release broke things in our automated upgrade procedure. I didn't have
time to diagnose the issue (I was running out) and figured it was more
urgent to restore the service.
>
> I rebooted all gnt-fsn nodes by hand (without migrating). Some instances
returned with a state of "ERROR_down", so I manually started them (with
gnt-instance start). Everything now seems to be back up.
>
> I haven't looked at Nagios in details, but everything is mostly
"yellow" now so I'll assume we're good.
>
> It would be great if someone could look at the logs and see what
happened. I suspect the openvswitch fix didn't work, or maybe there are
other servers we need to block from needrestart's automation (or maybe
even unattended-upgrades).
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/34185#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list