[tor-bugs] #31916 [Internal Services/Tor Sysadmin Team]: reliability issues with hetzner-nbg1-01

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Oct 2 19:01:10 UTC 2019


#31916: reliability issues with hetzner-nbg1-01
-------------------------------------------------+-------------------------
 Reporter:  anarcat                              |          Owner:  anarcat
     Type:  defect                               |         Status:
                                                 |  assigned
 Priority:  Medium                               |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Blocker                              |     Resolution:
 Keywords:                                       |  Actual Points:
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by anarcat):

 hetzner responded by asking for error messages, I sent them more logs from
 nagios:

 {{{
  we see various errors from the nagios monitoring server
  (hetzner-hel1-01.torproject.org), looking at that one. here's an
  example, yesterday, of pings failing for about 15 minutes:

  [2019-10-01 16:35:44] SERVICE ALERT: hetzner-
 nbg1-01;PING;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100%
  [2019-10-01 16:36:04] HOST ALERT: hetzner-nbg1-01;DOWN;SOFT;1;PING
 CRITICAL - Packet loss = 100%
  [2019-10-01 16:36:34] SERVICE ALERT: hetzner-nbg1-01;process - apache2 -
 master;CRITICAL;HARD;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 50
 seconds.
  [2019-10-01 16:36:44] SERVICE ALERT: hetzner-
 nbg1-01;PING;CRITICAL;HARD;1;PING CRITICAL - Packet loss = 100%
  [2019-10-01 16:37:24] HOST ALERT: hetzner-nbg1-01;DOWN;SOFT;2;PING
 CRITICAL - Packet loss = 100%
  [2019-10-01 16:38:44] HOST ALERT: hetzner-nbg1-01;DOWN;SOFT;3;PING
 CRITICAL - Packet loss = 100%
  [2019-10-01 16:40:04] HOST ALERT: hetzner-nbg1-01;DOWN;SOFT;4;PING
 CRITICAL - Packet loss = 100%
  [2019-10-01 16:40:14] HOST ALERT: hetzner-nbg1-01;UP;SOFT;5;PING OK -
 Packet loss = 0%, RTA = 26.97 ms
  [2019-10-01 16:41:34] SERVICE ALERT: hetzner-nbg1-01;PING;OK;HARD;1;PING
 OK - Packet loss = 0%, RTA = 23.79 ms
  [2019-10-01 16:50:44] SERVICE ALERT: hetzner-nbg1-01;process - apache2 -
 master;OK;HARD;1;PROCS OK: 1 process with UID = 0 (root), args
 '/usr/sbin/apache2'

  I could run a cross ping between the two servers in a screen session to
  try and diagnose this better for you, but from what i can tell, the
  packets just get dropped to the floor somewhere.
 }}}

 i've started a cross-ping between the nagios and prometheus servers to see
 if this can confirm the packet loss issue.

 this could correlate with ipsec problems as well.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/31916#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list