[tor-project] PSA: GitLab email notifications outage

Antoine Beaupré anarcat at torproject.org
Thu Mar 23 21:04:07 UTC 2023


Hi,

Today, starting from at least 07:04:22 UTC, mail notifications sent
from GitLab have been either delayed or dropped (unclear, probably the
latter).

This means that if you rely on GitLab notifications to order your work,
you will very likely need to login to GitLab and look at your issues. A
good way to catch up is look at the latest notifications in your "To Do"
list in:

https://gitlab.torproject.org/dashboard/todos

I am aware that many of you have a humongus and completely useless to do
list of death. I am sorry.

As of a 20:50UTC, email delivery has resumed and should be back to
normal until further notice.

### Technical details

Busy people or people less interested in technical details can skip the
remainder of this email.

It's unclear what happened. We're tracking the issue in this ticket:

https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/139

It looks like the regression was caused by the GitLab 15.9.3 to 15.10.0
upgrade because that upgrade completed at 06:35:53, half an hour before
the first email got lost.

It's also unclear if GitLab queued up those emails and is sending them
now, but I suspect it just dropped them. I couldn't find the right log
file in the thousands (literally) of log files GitLab keeps, so it's
really hard to tell.

*Why* this regression happened is simply beyond me. Me and kez poured
the best of both of our brains to figure out why, suddenly, GitLab
decided to not only use STARTTLS to connect to the local SMTP server
(which it was specifically told not to do) but *also* validate the
certificate (which it was *also* told not to do). We currently use a
bespoke CA for local SMTP servers and, naturally, that certificate
doesn't verify. And obviously setting the correct CA in GitLab's
settings doesn't work either, because why would anything work at this
point.

(Besides, it's unclear how anyone should issue a valid certificate for
`localhost` in the first place... ANYWAY.)

I filed this issue upstream:

https://gitlab.com/gitlab-org/gitlab/-/issues/399241

I am unsure how this issue is going to go, or how long this fix is going
to last, it's all quite obscure.

(This is why, by the way, we rarely try to patch GitLab. The code base
is byzantine at best, they ship their own Rails, Ruby, PostgreSQL,
Prometheus, Grafana (which is itself a special clusterfuck of deps),
Chef (!), and I won't bore with with the rest of the list: it's a total
mess, and it takes hours just to get your bearings to get anything done
at all. In this specific case, we completely gave up in patching what
should be a simple Rails app.)

So anyway. Fixed I guess?

A.

-- 
Antoine Beaupré
torproject.org system administration
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20230323/bc02a4f1/attachment.sig>


More information about the tor-project mailing list