[tor-bugs] #32801 [Internal Services/Tor Sysadmin Team]: major outage: kvm4 down, affected: eugeni (mail, lists), alberti (ldap), pauli (puppet), rouyi (jenkins), etc

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Dec 19 01:16:04 UTC 2019


#32801: major outage: kvm4 down, affected: eugeni (mail, lists), alberti (ldap),
pauli (puppet), rouyi (jenkins), etc
-------------------------------------------------+------------------------
 Reporter:  anarcat                              |          Owner:  hiro
     Type:  defect                               |         Status:  closed
 Priority:  Medium                               |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Normal                               |     Resolution:  fixed
 Keywords:                                       |  Actual Points:
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+------------------------
Changes (by anarcat):

 * status:  assigned => closed
 * resolution:   => fixed


Comment:

 > According to what was reported from Hetzner the server didn't showed a
 screen output and didn't respond to keystrokes. It had to be restarted
 manually.

 thanks for the update! that's really not reassuring at all though...

 >  We have a few old kvm machines that have issues occasionally and when
 they do we consider the services that do go offline. I think we might
 start considering that we have to migrate these machines sooner rather
 than later.

 I agree strongly with that sentiment. I opened a ticket about
 decommissionning kvm4 altogether, and i think we should systematically
 look at all those older machines and migrate critical stuff of them as
 soon as possible. maybe we could keep some of those boxes as build farms
 until they die: they are cheap, and generally just work.

 but machines like eugeni belong to a high availability cluster. i don't
 want to leave for a vacation with another crisis like this ever again. :p

 so I opened ticket #32802 to followup on kvm4's state.

 in the meantime, i force-scheduled backups of alberti, eugeni and pauli
 when the machine came back earlier, just to get a fresh copy in.

 considering that the machine now seems stable, i think this ticket can be
 closed. let's followup on the longer-term project in that other ticket.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32801#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list