server1:~$ ss -s
Total: 454644
TCP:   465840 (estab 368011, closed 36634, orphaned 7619, timewait 11466)

Transport Total     IP        IPv6
RAW  0         0         0        
UDP  48        48        0        
TCP  429206    413815    15391    
INET  429254    413863    15391    
FRAG  0         0         0      

81% inet_csk_bind_conflict

server2:~$ ss -s
Total: 460089
TCP:   477026 (estab 367786, closed 42817, orphaned 7456, timewait 17239)

Transport Total     IP        IPv6
RAW  0         0         0        
UDP  71        71        0        
TCP  434209    418235    15974    
INET  434280    418306    15974    
FRAG  1         1         0  

80% inet_csk_bind_conflict

(total combined throughput at the time of measurement was ~650 Mbps symmetrical per transit provider metrics, this low throughput volume is common when inet_csk_bind_conflict is this high)

Re OutboundBindAddress - yes, for both v4 and v6

Re kernel version - 5.15.0-56-generic (jammy). Foundation for Applied Privacy recommended that we try the nightly repo which apparently includes the IP_BIND_ADDRESS_NO_PORT change. However that merge request mentions a workaround of modifying net.ipv4.ip_local_port_range, which we've already performed.

--
Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: https://digitalcourage.social/@EmeraldOnion/




On Dec 3, 2022, at 3:02 AM, Anders Trier Olesen <anders.trier.olesen@gmail.com> wrote:

Hi Christopher

How many open connections do you have? (`ss -s`)
Do you happen to use OutboundBindAddress in your torrc?

What I think we need is for the Tor developers to include this PR in a release: https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/579
Once that has happened, I think the problem should go away, as long as you run a recent enough Linux kernel that supports IP_BIND_ADDRESS_NO_PORT (since Linux 4.2).

- Anders




fre. 2. dec. 2022 kl. 09.24 skrev Christopher Sheats <yawnbox@emeraldonion.org>:
Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings:
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,

--
Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390




_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays