[FIRING:1] Snowflake probetest failures

Total firing alerts: 1 Total resolved alerts: 0 ## Firing Alerts ----- Time: 2022-04-05 17:06:00.286335746 +0000 UTC Summary: There is a problem testing the type of NAT of our proxies Description: There are only 1.8 times proxies with known NAT type than with unknown ----- ##Resolved Alerts

It looks like the 'unkown' NAT proxies are going up pretty fast while other NATs go down. I got into the server, but I don't see it being overloaded or any obvious problem. I restarted probetest, let's see if it gets back to normal. Quoting alertmanager@hetzner-nbg1-02.torproject.org (2022-04-05 19:06:44)
Total firing alerts: 1 Total resolved alerts: 0
## Firing Alerts
----- Time: 2022-04-05 17:06:00.286335746 +0000 UTC Summary: There is a problem testing the type of NAT of our proxies Description: There are only 1.8 times proxies with known NAT type than with unknown
-----
##Resolved Alerts
-- meskio | https://meskio.net/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- My contact info: https://meskio.net/crypto.txt -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Nos vamos a Croatan.

Quoting meskio (2022-04-05 20:11:37)
It looks like the 'unkown' NAT proxies are going up pretty fast while other NATs go down. I got into the server, but I don't see it being overloaded or any obvious problem. I restarted probetest, let's see if it gets back to normal.
The restart seems to have solved the problem :) -- meskio | https://meskio.net/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- My contact info: https://meskio.net/crypto.txt -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Nos vamos a Croatan.

On Wed, Apr 06, 2022 at 08:23:41AM +0000, meskio wrote:
Quoting meskio (2022-04-05 20:11:37)
It looks like the 'unkown' NAT proxies are going up pretty fast while other NATs go down. I got into the server, but I don't see it being overloaded or any obvious problem. I restarted probetest, let's see if it gets back to normal.
The restart seems to have solved the problem :)
Ok great, but, that means the "nat prober can't keep up" issue isn't a scalability bottleneck, it is a bug in the nat prober -- where sometimes it decides to stop probing, for reasons we don't understand? If so, then better load balancing (https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/75) won't necessarily solve the issue. --Roger

Quoting Roger Dingledine (2022-04-06 10:33:39)
On Wed, Apr 06, 2022 at 08:23:41AM +0000, meskio wrote:
Quoting meskio (2022-04-05 20:11:37)
It looks like the 'unkown' NAT proxies are going up pretty fast while other NATs go down. I got into the server, but I don't see it being overloaded or any obvious problem. I restarted probetest, let's see if it gets back to normal.
The restart seems to have solved the problem :)
Ok great, but, that means the "nat prober can't keep up" issue isn't a scalability bottleneck, it is a bug in the nat prober -- where sometimes it decides to stop probing, for reasons we don't understand?
If so, then better load balancing (https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/75) won't necessarily solve the issue.
Exactly, I'm not sure load balancing will solve the problem. We do have an issue for this specific problem, but haven't found what is causing it: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... -- meskio | https://meskio.net/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- My contact info: https://meskio.net/crypto.txt -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Nos vamos a Croatan.
participants (3)
-
alertmanager@hetzner-nbg1-02.torproject.org
-
meskio
-
Roger Dingledine