[tor-bugs] #32762 [Internal Services/Tor Sysadmin Team]: major networking issues on moly, affects: majus, fallax, web-cymru-01, build-x86-05, build-x86-06 (was: majus cannot connect to internet with git or the transifex client (although it pings ok))

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Dec 16 17:23:45 UTC 2019


#32762: major networking issues on moly, affects: majus, fallax, web-cymru-01,
build-x86-05, build-x86-06
-------------------------------------------------+-------------------------
 Reporter:  emmapeel                             |          Owner:  anarcat
     Type:  defect                               |         Status:
                                                 |  accepted
 Priority:  Very High                            |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Major                                |     Resolution:
 Keywords:  translation, l10n, majus, moly       |  Actual Points:
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------
Changes (by anarcat):

 * priority:  High => Very High
 * severity:  Normal => Major


Comment:

 this is a larger issue than just moly. I've filed the following ticket
 with cymru (upstream):

 > Hi!
 >
 > Since around  2019-12-15 23:55:01 UTC, we have started seeing some weird
 > networking issues with moly.torproject.org. I can reach the machine okay
 > and it pings properly, but some TCP connexions do not work
 > correctly. For example, this works:
 >
 > root at moly:~# curl -I https://www.google.com/
 > HTTP/2 200
 > date: Mon, 16 Dec 2019 16:43:16 GMT
 > [...]
 >
 > But this hangs:
 >
 > root at moly:~# curl -v -I https://github.com/
 > *   Trying 140.82.113.3...
 > * TCP_NODELAY set
 > * Connected to github.com (140.82.113.3) port 443 (#0)
 > * ALPN, offering h2
 > * ALPN, offering http/1.1
 > * Cipher selection:
 ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
 > * successfully set certificate verify locations:
 > *   CAfile: /etc/ssl/certs/ca-certificates.crt
 >   CApath: /etc/ssl/certs
 > * TLSv1.2 (OUT), TLS header, Certificate Status (22):
 > * TLSv1.2 (OUT), TLS handshake, Client hello (1):
 >
 > We similarly have trouble running rsync to our other servers, running
 > security upgrades with "apt update", or cloning git repositories from
 > github.
 >
 > I'm puzzled by this - I can't quite figure out this discrepancy.
 >
 > This was first reported as a bug against our translation server:
 >
 > https://trac.torproject.org/projects/tor/ticket/32762
 >
 > .. but it also affects a DNS server (fallax), a build box and a web
 > mirror. It would be great if you could look into this promptly because
 > it's a bit of a show stopper for us.
 >
 > Thanks!
 >
 > a.
 >
 > --
 > Antoine Beaupré
 > torproject.org system administration

 I'll try to do more network diagnostics after lunch, with the hope this
 can be resolved in the more short term. But we have started a mitigation
 strategy that involves restoring majus from backups.

 {{{
 12:03:19 <+anarcat> this is the last backup that ran:
 12:03:22 <+anarcat>
 +---------+-------+----------+---------------+---------------------+-------------------------------------------------------+
 12:03:22 <+anarcat> | jobid   | level | jobfiles | jobbytes      |
 starttime           | volumename
 |
 12:03:22 <+anarcat>
 +---------+-------+----------+---------------+---------------------+-------------------------------------------------------+
 12:03:26 <+anarcat> | 118,333 | I     |    1,510 |    42,379,972 |
 2019-12-15 10:27:15 | torproject-majus.torproject.org-inc.2019-12-15_10:27
 |
 12:03:26 <+anarcat>
 +---------+-------+----------+---------------+---------------------+-------------------------------------------------------+
 }}}

 We've been meaning to move majus to the ganeti cluster (#31784 as part of
 #29974), exactly for this kind of scenario. Thankfully, we migrated the
 director and getulum already, so this problem is not as bad as it would
 have been.

 But it will still take some time (days?) to restore the service, if Cymru
 doesn't figure it out in time.

 Sorry about this trouble everyone! Hopefully we'll be able to get back on
 track soon... I'm just happy it's happening this week, instead of during
 the holidays. :p

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32762#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list