Sorry to hear it wasn't much help. Even though the additions I suggested didn't help they certainly couldn't cause any harm and can't be responsible for the drops in traffic.
As for the torutils scripts, I'm sure toralf would be able to better investigate that but I have a feeling you have a certain set up that might not have worked with the script. May I ask what your set up is? Are you running your relays on separate VMs on the main system or are you using a different set up like having all IP addresses on the same OS and using OutboundBindAddress , routing, etc... to separate them? If I know more, I might be able to make a script specific to your set up.
On 12/3/2022 2:07 PM, Christopher Sheats wrote:
Hello,
Thank you for this information. After 24-hours of testing, these configurations brought Tor to a halt.
At first I started with the sysctl modifications. After a few hours with just that, there was no improvement in ~75% inet_csk_bind_conflict utilization. I then installed Torutils for both IPv4 and IPv6. After only a couple of hours, Tor dropped to below 15 Mbps across both servers (40 relays). 16 hours later, Tor dropped below 2 Mbps.
I've removed all of these new settings and restarted.
-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/
On Dec 2, 2022, at 7:30 AM, Chris tor@wcbsecurity.com wrote:
Hi,
As I'm sure you've already gathered, your system is maxing out trying to deal with all the connection requests. When inet_csk_get_port is called and the port is found to be occupied then inet_csk_bind_conflict is called to resolve the conflict. So in normal circumstances you shouldn't see it in perf top much less at 79%. There are two ways to deal with it, and each method should be complimented by the other. One way is to try to increase the number of ports and reduce the wait time which you have somehow tried. I would add the following:
net.ipv4.tcp_fin_timeout = 20
net.ipv4.tcp_max_tw_buckets = 1200
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 8192
The complimentary method to the above is to lower the number of connection requests by removing the frivolous connection requests out of the equation using a few iptables rules.
I'm assuming the increased load you're experiencing is due to the current DDos attacks and I'm not sure if you're using anything to mitigate that but you should consider it.
You may find something useful at the following links
[1](https://github.com/Enkidu-6/tor-ddos)
[2](https://github.com/toralf/torutils)
[background](https://gitlab.torproject.org/tpo/community/support/-/issues/40093)
Cheers.
On 12/1/2022 3:35 PM, Christopher Sheats wrote:
Hello tor-relays,
We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization.
A while back we thought we solved with with two /etc/sysctl.conf settings: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1
However we are still experiencing this problem.
Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.
Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023
Does anyone have experience troubleshooting and/or fixing this problem?
Cheers,
-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays