inet_csk_bind_conflict

newer
Notification when tor relay goes...

older
Next Tor Relay Operator Meetup -...

Christopher Sheats

2 Dec 2022 2 Dec '22

2:35 a.m.

Hello tor-relays, We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization. A while back we thought we solved with with two /etc/sysctl.conf settings: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1 However we are still experiencing this problem. Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11. Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023 Does anyone have experience troubleshooting and/or fixing this problem? Cheers, -- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

Attachments:

attachment.html (text/html — 2.1 KB)
signature.asc (application/pgp-signature — 833 bytes)

Show replies by date

Chris

2 Dec 2 Dec

9:30 p.m.

Hi, As I'm sure you've already gathered, your system is maxing out trying to deal with all the connection requests. When inet_csk_get_port is called and the port is found to be occupied then inet_csk_bind_conflict is called to resolve the conflict. So in normal circumstances you shouldn't see it in perf top much less at 79%. There are two ways to deal with it, and each method should be complimented by the other. One way is to try to increase the number of ports and reduce the wait time which you have somehow tried. I would add the following: net.ipv4.tcp_fin_timeout = 20 net.ipv4.tcp_max_tw_buckets = 1200 net.ipv4.tcp_keepalive_time = 1200 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 8192 The complimentary method to the above is to lower the number of connection requests by removing the frivolous connection requests out of the equation using a few iptables rules. I'm assuming the increased load you're experiencing is due to the current DDos attacks and I'm not sure if you're using anything to mitigate that but you should consider it. You may find something useful at the following links [1](https://github.com/Enkidu-6/tor-ddos) [2](https://github.com/toralf/torutils) [background](https://gitlab.torproject.org/tpo/community/support/-/issues/40093) Cheers. On 12/1/2022 3:35 PM, Christopher Sheats wrote:

...

Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Christopher Sheats

4 Dec 4 Dec

1:07 a.m.

Hello, Thank you for this information. After 24-hours of testing, these configurations brought Tor to a halt. At first I started with the sysctl modifications. After a few hours with just that, there was no improvement in ~75% inet_csk_bind_conflict utilization. I then installed Torutils for both IPv4 and IPv6. After only a couple of hours, Tor dropped to below 15 Mbps across both servers (40 relays). 16 hours later, Tor dropped below 2 Mbps. I've removed all of these new settings and restarted. -- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

...

On Dec 2, 2022, at 7:30 AM, Chris <tor@wcbsecurity.com> wrote:

Hi,

As I'm sure you've already gathered, your system is maxing out trying to deal with all the connection requests. When inet_csk_get_port is called and the port is found to be occupied then inet_csk_bind_conflict is called to resolve the conflict. So in normal circumstances you shouldn't see it in perf top much less at 79%. There are two ways to deal with it, and each method should be complimented by the other. One way is to try to increase the number of ports and reduce the wait time which you have somehow tried. I would add the following:

net.ipv4.tcp_fin_timeout = 20

net.ipv4.tcp_max_tw_buckets = 1200

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

The complimentary method to the above is to lower the number of connection requests by removing the frivolous connection requests out of the equation using a few iptables rules.

I'm assuming the increased load you're experiencing is due to the current DDos attacks and I'm not sure if you're using anything to mitigate that but you should consider it.

You may find something useful at the following links

[1](https://github.com/Enkidu-6/tor-ddos)

[2](https://github.com/toralf/torutils)

[background](https://gitlab.torproject.org/tpo/community/support/-/issues/40093)

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:

...
Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Chris

5 Dec 5 Dec

12:08 p.m.

Sorry to hear it wasn't much help. Even though the additions I suggested didn't help they certainly couldn't cause any harm and can't be responsible for the drops in traffic. As for the torutils scripts, I'm sure toralf would be able to better investigate that but I have a feeling you have a certain set up that might not have worked with the script. May I ask what your set up is? Are you running your relays on separate VMs on the main system or are you using a different set up like having all IP addresses on the same OS and using OutboundBindAddress , routing, etc... to separate them? If I know more, I might be able to make a script specific to your set up. On 12/3/2022 2:07 PM, Christopher Sheats wrote:

...

Hello,

Thank you for this information. After 24-hours of testing, these configurations brought Tor to a halt.

At first I started with the sysctl modifications. After a few hours with just that, there was no improvement in ~75% inet_csk_bind_conflict utilization. I then installed Torutils for both IPv4 and IPv6. After only a couple of hours, Tor dropped to below 15 Mbps across both servers (40 relays). 16 hours later, Tor dropped below 2 Mbps.

I've removed all of these new settings and restarted.

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

...
On Dec 2, 2022, at 7:30 AM, Chris <tor@wcbsecurity.com> wrote:

Hi,

As I'm sure you've already gathered, your system is maxing out trying to deal with all the connection requests. When inet_csk_get_port is called and the port is found to be occupied then inet_csk_bind_conflict is called to resolve the conflict. So in normal circumstances you shouldn't see it in perf top much less at 79%. There are two ways to deal with it, and each method should be complimented by the other. One way is to try to increase the number of ports and reduce the wait time which you have somehow tried. I would add the following:

net.ipv4.tcp_fin_timeout = 20

net.ipv4.tcp_max_tw_buckets = 1200

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

The complimentary method to the above is to lower the number of connection requests by removing the frivolous connection requests out of the equation using a few iptables rules.

I'm assuming the increased load you're experiencing is due to the current DDos attacks and I'm not sure if you're using anything to mitigate that but you should consider it.

You may find something useful at the following links

[1](https://github.com/Enkidu-6/tor-ddos)

[2](https://github.com/toralf/torutils)

[background](https://gitlab.torproject.org/tpo/community/support/-/issues/40093)

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:

...
Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Christopher Sheats

6 Dec 6 Dec

2:48 a.m.

...

May I ask what your set up is? Are you running your relays on separate VMs on the main system or are you using a different set up like having all IP addresses on the same OS and using OutboundBindAddress , routing, etc... to separate them? If I know more, I might be able to make a script specific to your set up.

Thank you. Yes, of course. Ubuntu server 22.04 runs on bare metal. Ansible-relayor manages 20 exit relays on each system. Netplan has each IP individually listed (sub-divided as a /25 per server from within a dedicated /24, similarly for v6 addresses). I believe an available IP is randomly picked by ansible-relayor and used statically in each torrc file. Here is an example torrc: # ansible-relayor generated torrc configuration file # Note: manual changes will be OVERWRITTEN on the next ansible-playbook run OfflineMasterKey 1 RunAsDaemon 0 Log notice syslog OutboundBindAddress 23.129.64.130 SocksPort 0 User _tor-23.129.64.130_443 DataDirectory /var/lib/tor-instances/23.129.64.130_443 ORPort 23.129.64.130:443 ORPort [2620:18c:0:192::130]:443 OutboundBindAddress [2620:18c:0:192::130] DirPort 23.129.64.130:80 Address 23.129.64.130 SyslogIdentityTag 23.129.64.130_443 ControlSocket /var/run/tor-instances/23.129.64.130_443/control GroupWritable RelaxDirModeCheck Nickname ageis ContactInfo url:emeraldonion.org proof:uri-rsa ciissversion:2 tech@emeraldonion.org Sandbox 1 NoExec 1 # we are an exit relay! ExitRelay 1 IPv6Exit 1 DirPort [2620:18c:0:192::130]:80 NoAdvertise DirPortFrontPage /etc/tor/instances/tor-exit-notice.html ExitPolicy reject 23.129.64.128/25:*,reject6 [2613:18c:0:192::]/64:*,accept *:*,accept6 *:* MyFamily <snip> # end of torrc -- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

...

On Dec 4, 2022, at 10:08 PM, Chris <tor@wcbsecurity.com> wrote:

Sorry to hear it wasn't much help. Even though the additions I suggested didn't help they certainly couldn't cause any harm and can't be responsible for the drops in traffic.

As for the torutils scripts, I'm sure toralf would be able to better investigate that but I have a feeling you have a certain set up that might not have worked with the script. May I ask what your set up is? Are you running your relays on separate VMs on the main system or are you using a different set up like having all IP addresses on the same OS and using OutboundBindAddress , routing, etc... to separate them? If I know more, I might be able to make a script specific to your set up.

On 12/3/2022 2:07 PM, Christopher Sheats wrote:

...
Hello,

Thank you for this information. After 24-hours of testing, these configurations brought Tor to a halt.

At first I started with the sysctl modifications. After a few hours with just that, there was no improvement in ~75% inet_csk_bind_conflict utilization. I then installed Torutils for both IPv4 and IPv6. After only a couple of hours, Tor dropped to below 15 Mbps across both servers (40 relays). 16 hours later, Tor dropped below 2 Mbps.

I've removed all of these new settings and restarted.

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

...
On Dec 2, 2022, at 7:30 AM, Chris <tor@wcbsecurity.com> wrote:

Hi,

As I'm sure you've already gathered, your system is maxing out trying to deal with all the connection requests. When inet_csk_get_port is called and the port is found to be occupied then inet_csk_bind_conflict is called to resolve the conflict. So in normal circumstances you shouldn't see it in perf top much less at 79%. There are two ways to deal with it, and each method should be complimented by the other. One way is to try to increase the number of ports and reduce the wait time which you have somehow tried. I would add the following:

net.ipv4.tcp_fin_timeout = 20

net.ipv4.tcp_max_tw_buckets = 1200

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

The complimentary method to the above is to lower the number of connection requests by removing the frivolous connection requests out of the equation using a few iptables rules.

I'm assuming the increased load you're experiencing is due to the current DDos attacks and I'm not sure if you're using anything to mitigate that but you should consider it.

You may find something useful at the following links

[1](https://github.com/Enkidu-6/tor-ddos)

[2](https://github.com/toralf/torutils)

[background](https://gitlab.torproject.org/tpo/community/support/-/issues/40093)

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:

...
Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Chris

3:18 a.m.

Excellent. Thank you. Yes a blanket iotables rule is not going to work well in this set up as it pools all connections to all IP addresses into one. So if we accept 4 connections to port 443, a blanket iptables rules accepts 4 connections to all IP addresses combined and drops everything else and of course that brings your server to a halt. In another thread in this mailing list, they had the same situation and I put a script together yesterday that you're welcome to try if you wish. Not sure if they've tried it yet or what the result has been. But the script is set up to apply the rules to two IP addresses at a time and leave the rest alone. So you can apply to two addresses on your server, assess the result and then either expand to the rest or stop altogether. The script makes a back up of your existing iptables rules. All you have to do is restore it and everything goes back to how it was without having to reboot. It also specifically uses the mangle table and PREROUTING and it won't interfere with your existing rules. That should reduce the number of used ports as well. Flushing the mangle table will also get rid of these rules and you're back to how it was before. You can get it here: https://raw.githubusercontent.com/Enkidu-6/tor-ddos/dev/multiple/multi-addr.... Simply choose two of your IP addresses and the ORPort for each and run the script. If it does what you expect it to do, all you have to do is to change the IP Addresses and run the script again until all your addresses are covered. Please save the iptables backup somewhere else as the second time you run the script, the original back up will be overwritten. If one of your IP addresses has two ORPorts, the above script won't work and you should use the script below: https://raw.githubusercontent.com/Enkidu-6/tor-ddos/dev/multiple/two-or.sh Best of luck and I hope this helps. On 12/5/2022 3:48 PM, Christopher Sheats wrote:

...

...
May I ask what your set up is? Are you running your relays on separate VMs on the main system or are you using a different set up like having all IP addresses on the same OS and using OutboundBindAddress , routing, etc... to separate them? If I know more, I might be able to make a script specific to your set up.

Thank you. Yes, of course.

Ubuntu server 22.04 runs on bare metal. Ansible-relayor manages 20 exit relays on each system. Netplan has each IP individually listed (sub-divided as a /25 per server from within a dedicated /24, similarly for v6 addresses). I believe an available IP is randomly picked by ansible-relayor and used statically in each torrc file.

Here is an example torrc:

# ansible-relayor generated torrc configuration file

# Note: manual changes will be OVERWRITTEN on the next ansible-playbook run

OfflineMasterKey 1

RunAsDaemon 0

Log notice syslog

OutboundBindAddress 23.129.64.130

SocksPort 0

User _tor-23.129.64.130_443

DataDirectory /var/lib/tor-instances/23.129.64.130_443

ORPort 23.129.64.130:443

ORPort [2620:18c:0:192::130]:443

OutboundBindAddress [2620:18c:0:192::130]

DirPort 23.129.64.130:80

Address 23.129.64.130

SyslogIdentityTag 23.129.64.130_443

ControlSocket /var/run/tor-instances/23.129.64.130_443/control GroupWritable RelaxDirModeCheck

Nickname ageis

ContactInfo url:emeraldonion.org proof:uri-rsa ciissversion:2 tech@emeraldonion.org

Sandbox 1

NoExec 1

# we are an exit relay!

ExitRelay 1

IPv6Exit 1

DirPort [2620:18c:0:192::130]:80 NoAdvertise

DirPortFrontPage /etc/tor/instances/tor-exit-notice.html

ExitPolicy reject 23.129.64.128/25:*,reject6 [2613:18c:0:192::]/64:*,accept *:*,accept6 *:*

MyFamily <snip>

# end of torrc

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

...
On Dec 4, 2022, at 10:08 PM, Chris <tor@wcbsecurity.com> wrote:

Sorry to hear it wasn't much help. Even though the additions I suggested didn't help they certainly couldn't cause any harm and can't be responsible for the drops in traffic.

As for the torutils scripts, I'm sure toralf would be able to better investigate that but I have a feeling you have a certain set up that might not have worked with the script. May I ask what your set up is? Are you running your relays on separate VMs on the main system or are you using a different set up like having all IP addresses on the same OS and using OutboundBindAddress , routing, etc... to separate them? If I know more, I might be able to make a script specific to your set up.

On 12/3/2022 2:07 PM, Christopher Sheats wrote:

...
Hello,

Thank you for this information. After 24-hours of testing, these configurations brought Tor to a halt.

At first I started with the sysctl modifications. After a few hours with just that, there was no improvement in ~75% inet_csk_bind_conflict utilization. I then installed Torutils for both IPv4 and IPv6. After only a couple of hours, Tor dropped to below 15 Mbps across both servers (40 relays). 16 hours later, Tor dropped below 2 Mbps.

I've removed all of these new settings and restarted.

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

...
On Dec 2, 2022, at 7:30 AM, Chris <tor@wcbsecurity.com> wrote:

Hi,

As I'm sure you've already gathered, your system is maxing out trying to deal with all the connection requests. When inet_csk_get_port is called and the port is found to be occupied then inet_csk_bind_conflict is called to resolve the conflict. So in normal circumstances you shouldn't see it in perf top much less at 79%. There are two ways to deal with it, and each method should be complimented by the other. One way is to try to increase the number of ports and reduce the wait time which you have somehow tried. I would add the following:

net.ipv4.tcp_fin_timeout = 20

net.ipv4.tcp_max_tw_buckets = 1200

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

The complimentary method to the above is to lower the number of connection requests by removing the frivolous connection requests out of the equation using a few iptables rules.

I'm assuming the increased load you're experiencing is due to the current DDos attacks and I'm not sure if you're using anything to mitigate that but you should consider it.

You may find something useful at the following links

[1](https://github.com/Enkidu-6/tor-ddos)

[2](https://github.com/toralf/torutils)

[background](https://gitlab.torproject.org/tpo/community/support/-/issues/40093)

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:

...
Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

lists＠for-privacy.net

16 Dec 16 Dec

4:25 a.m.

On Freitag, 2. Dezember 2022 16:30:48 CET Chris wrote:

...

As I'm sure you've already gathered, your system is maxing out trying to deal with all the connection requests. When inet_csk_get_port is called and the port is found to be occupied then inet_csk_bind_conflict is called to resolve the conflict. So in normal circumstances you shouldn't see it in perf top much less at 79%. There are two ways to deal with it, and each method should be complimented by the other. One way is to try to increase the number of ports and reduce the wait time which you have somehow tried. I would add the following:

I use on old Dual Intel Xeon E5-2680v2 CPU's, 256 GB RAM & the Tor IP's/traffic routed over a dual 10G NIC. (40 exit relays)

...

net.ipv4.tcp_fin_timeout = 20 net.ipv4.tcp_fin_timeout = 4

...

net.ipv4.tcp_max_tw_buckets = 1200 net.ipv4.tcp_max_tw_buckets = 2000000

...

net.ipv4.tcp_keepalive_time = 1200 net.ipv4.tcp_keepalive_time = 60

...

net.ipv4.tcp_max_syn_backlog = 8192 net.core.netdev_max_backlog = 262144

https://github.com/boldsuck/tor-relay-configs/blob/main/etc/sysctl.d/local.c...

...

The complimentary method to the above is to lower the number of connection requests by removing the frivolous connection requests out of the equation using a few iptables rules.

I'm assuming the increased load you're experiencing is due to the current DDos attacks and I'm not sure if you're using anything to mitigate that but you should consider it.

You may find something useful at the following links

[1](https://github.com/Enkidu-6/tor-ddos)

[2](https://github.com/toralf/torutils)

[background](https://gitlab.torproject.org/tpo/community/support/-/issues/40 093)

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:

...
Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

-- ╰_╯ Ciao Marco! Debian GNU/Linux It's free software and it gives you freedom!

Anders Trier Olesen

3 Dec 3 Dec

5:02 p.m.

Hi Christopher How many open connections do you have? (`ss -s`) Do you happen to use OutboundBindAddress in your torrc? What I think we need is for the Tor developers to include this PR in a release: https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/579 Once that has happened, I think the problem should go away, as long as you run a recent enough Linux kernel that supports IP_BIND_ADDRESS_NO_PORT (since Linux 4.2). - Anders fre. 2. dec. 2022 kl. 09.24 skrev Christopher Sheats < yawnbox@emeraldonion.org>:

...

Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Christopher Sheats

6 Dec 6 Dec

2:33 a.m.

server1:~$ ss -s Total: 454644 TCP: 465840 (estab 368011, closed 36634, orphaned 7619, timewait 11466) Transport Total IP IPv6 RAW 0 0 0 UDP 48 48 0 TCP 429206 413815 15391 INET 429254 413863 15391 FRAG 0 0 0 81% inet_csk_bind_conflict server2:~$ ss -s Total: 460089 TCP: 477026 (estab 367786, closed 42817, orphaned 7456, timewait 17239) Transport Total IP IPv6 RAW 0 0 0 UDP 71 71 0 TCP 434209 418235 15974 INET 434280 418306 15974 FRAG 1 1 0 80% inet_csk_bind_conflict (total combined throughput at the time of measurement was ~650 Mbps symmetrical per transit provider metrics, this low throughput volume is common when inet_csk_bind_conflict is this high) Re OutboundBindAddress - yes, for both v4 and v6 Re kernel version - 5.15.0-56-generic (jammy). Foundation for Applied Privacy recommended that we try the nightly repo which apparently includes the IP_BIND_ADDRESS_NO_PORT change. However that merge request mentions a workaround of modifying net.ipv4.ip_local_port_range, which we've already performed. -- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

...

On Dec 3, 2022, at 3:02 AM, Anders Trier Olesen <anders.trier.olesen@gmail.com> wrote:

Hi Christopher

How many open connections do you have? (`ss -s`) Do you happen to use OutboundBindAddress in your torrc?

What I think we need is for the Tor developers to include this PR in a release: https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/579 Once that has happened, I think the problem should go away, as long as you run a recent enough Linux kernel that supports IP_BIND_ADDRESS_NO_PORT (since Linux 4.2).

- Anders

fre. 2. dec. 2022 kl. 09.24 skrev Christopher Sheats <yawnbox@emeraldonion.org <mailto:yawnbox@emeraldonion.org>>:

...
Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via 'perf top', where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,

-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org <mailto:tor-relays@lists.torproject.org> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Alexander Færøy

9 Dec 9 Dec

3:47 p.m.

On 2022/12/01 20:35, Christopher Sheats wrote:

...

Does anyone have experience troubleshooting and/or fixing this problem?

Like I wrote in [1], I think it would be interesting to hear if the patch from pseudonymisaTor in ticket #26646[2] would be of any help in the given situation. The patch allows an exit operator to specify a range of IP addresses for binding purposes for outbound connections. I would think this could split the load wasted on trying to resolve port conflicts in the kernel amongst the set of IP's you have available for outbound connections. All the best, Alex. [1]: https://mastodon.social/@ahf/109382411984106226 [2]: https://gitlab.torproject.org/tpo/core/tor/-/issues/26646#note_2795959 -- Alexander Færøy

Anders Trier Olesen

10 Dec 10 Dec

4:50 a.m.

Hi again I took another look at this problem, and now I'm even more convinced that what we really need is IP_BIND_ADDRESS_NO_PORT. Here's why. If torrc OutboundBindAddress is configured, tor calls bind(2) on every outgoing connection: https://gitlab.torproject.org/tpo/core/tor/-/blob/tor-0.4.7.12/src/core/main... with sockaddr_in.sin_port set to 0 on #L2438. The kernel doesn't know that we'll not be using this socket for listen(2), so the kernel attempts to find an unused local two-tuple (according to [1]. Actually a three-tuple: <protocol, source ip, source port>): The bind syscall is handled by inet_bind: https://elixir.bootlin.com/linux/v5.15.56/source/net/ipv4/af_inet.c#L438 which calls __inet_bind that in turn calls sk->sk_prot->get_port on #L531 (notice the if on #L529). get_port is implemented by inet_csk_get_port in inet_connection_sock.c: https://elixir.bootlin.com/linux/v5.15.56/source/net/ipv4/inet_connection_so... On #L375, we call inet_csk_find_open_port (defined on #L190) to find a free port. inet_csk_find_open_port gets the local port range on #L206 (i.e net.ipv4.ip_local_port_range), selects a random starting point (L#222), and loops through all the ports until it finds one that is free (#L230). For every port candidate, if it is already in use (#L240) it calls inet_csk_bind_conflict (#L241), which is defined on #L133. As far as I understand, it is inet_csk_bind_conflict's job is to determine if it is safe to bind to the port anyway (ex, the existing connection could be in TCP_TIME_WAIT and SO_REUSEPORT set on the socket). This is where your server spend so much time. Increasing net.ipv4.ip_local_port_range doesn't solve the problem, but makes it more likely to find a port that is free. Lets trace back to the "if" in __inet_bind on #L529: https://elixir.bootlin.com/linux/v5.15.56/source/net/ipv4/af_inet.c#L529 Since we call bind with sockaddr_in.sin_port set to 0, snum is 0, and we can avoid the whole call chain by setting inet->bind_address_no_port to 1. I.e this patch: https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/579/diffs?commit... That should allow the kernel to use already in use src ports as long as the TCP 4-tuple is unique. Please include it in the next tor release! :) [1]: https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-s... - Anders On Fri, Dec 9, 2022 at 10:47 AM Alexander Færøy <ahf@torproject.org> wrote:

...

On 2022/12/01 20:35, Christopher Sheats wrote:

...
Does anyone have experience troubleshooting and/or fixing this problem?

Like I wrote in [1], I think it would be interesting to hear if the patch from pseudonymisaTor in ticket #26646[2] would be of any help in the given situation. The patch allows an exit operator to specify a range of IP addresses for binding purposes for outbound connections. I would think this could split the load wasted on trying to resolve port conflicts in the kernel amongst the set of IP's you have available for outbound connections.

All the best, Alex.

[1]: https://mastodon.social/@ahf/109382411984106226 [2]: https://gitlab.torproject.org/tpo/core/tor/-/issues/26646#note_2795959

-- Alexander Færøy _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

David Fifield

10:40 a.m.

On Fri, Dec 09, 2022 at 09:47:07AM +0000, Alexander Færøy wrote:

...

On 2022/12/01 20:35, Christopher Sheats wrote:

...
Does anyone have experience troubleshooting and/or fixing this problem?

Like I wrote in [1], I think it would be interesting to hear if the patch from pseudonymisaTor in ticket #26646[2] would be of any help in the given situation. The patch allows an exit operator to specify a range of IP addresses for binding purposes for outbound connections. I would think this could split the load wasted on trying to resolve port conflicts in the kernel amongst the set of IP's you have available for outbound connections.

This sounds similar to a problem we faced with the main Snowflake bridge. After usage passed a certain threshold, we started getting constant EADDRNOTAVAIL, not on the outgoing connections to middle nodes, but on the many localhost TCP connections used by the pluggable transports model. https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... Long story short, the only mitigation that worked for us was to bind sockets to an address (with port number unspecified, and with IP_BIND_ADDRESS_NO_PORT *unset*) before connecting them, and use different 127.0.0.0/8 addresses or ranges of addresses in different segments of the communication chain. https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... https://gitlab.torproject.org/dcf/extor-static-cookie/-/commit/a5c7a038a71ae... IP_BIND_ADDRESS_NO_PORT was mentioned in another part of the thread (https://lists.torproject.org/pipermail/tor-relays/2022-December/020895.html). For us, this bind option *did not help* and in fact we had to apply a workaround for Haproxy, which has IP_BIND_ADDRESS_NO_PORT hardcoded. *Why* that should be the case is a mystery to me, as is why it is true that bind-before-connect avoids EADDRNOTAVAIL even when the address manually bound to is the very same address the kernel would have automatically assigned. I even spent some time reading the Linux 5.10 source code trying to make sense of it. In the source code I found, or at least think I found, code paths for the behvior I observed; but the behavior seems to go against how bind and IP_BIND_ADDRESS_NO_PORT are documented to work. https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

...

Although my understanding of what Linux is doing is very imperfect, my understanding is that both of these questions have the same answer: port number assignment in `connect` when called on a socket not yet bound to a port works differently than in `bind` when called with a port number of 0. In case (1), the socket is not bound to a port because you haven't even called `bind`. In case (2), the socket is not bound to a port because haproxy sets the `IP_BIND_ADDRESS_NO_PORT` sockopt before calling `bind`. When you call `bind` *without* `IP_BIND_ADDRESS_NO_PORT`, it causes the port number to be bound before calling `connect`, which avoids the code path in `connect` that results in `EADDRNOTAVAIL`.

I am confused by these results, which are contrary to my understanding of what `IP_BIND_ADDRESS_NO_PORT` is supposed to do, which is precisely to avoid the problem of source address port exhaustion by deferring the port number assignment until the time of `connect`, when additional information about the destination address is available. But it's demonstrable that binding to a source port before calling `connect` avoids `EADDRNOTAVAIL` errors in our use cases, whatever the cause may be.

Anders Trier Olesen

2:59 p.m.

Hi David IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your Haproxy setup, because all the connections are to the same dst tuple <ip, port> (i.e 127.0.0.1:ExtORPort). The connect() system call is looking for a unique 5-tuple <protocol, srcip, srcport, dstip, dstport>. In the Haproxy setup, the only free variable is srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling IP_BIND_ADDRESS_NO_PORT makes no difference. The following should help (unless found a bug in Linux): 1. Let tor listen on a bunch of different ExtORPort 2. Let tor listen on a bunch of ips for the ExtORPort (so we have #ExtORPort * #ExtOrPortListenIPs unique combinations) 3. Connect from different src ips (what you already implemented) 4. sysctl -w net.ipv4.ip_local_port_range="1024 65535" For 1 and 2 to make a difference, if you do a 3 (i.e bind before connect), you need IP_BIND_ADDRESS_NO_PORT enabled on the socket. Tor relays already connect to many different dstip:dstport pairs, so enabling IP_BIND_ADDRESS_NO_PORT should solve our problem. I rest my case ;) Best regards Anders Trier Olesen On Sat, Dec 10, 2022 at 5:41 AM David Fifield <david@bamsoftware.com> wrote:

...

On Fri, Dec 09, 2022 at 09:47:07AM +0000, Alexander Færøy wrote:

...
On 2022/12/01 20:35, Christopher Sheats wrote:

...
Does anyone have experience troubleshooting and/or fixing this problem?

Like I wrote in [1], I think it would be interesting to hear if the patch from pseudonymisaTor in ticket #26646[2] would be of any help in the given situation. The patch allows an exit operator to specify a range of IP addresses for binding purposes for outbound connections. I would think this could split the load wasted on trying to resolve port conflicts in the kernel amongst the set of IP's you have available for outbound connections.

This sounds similar to a problem we faced with the main Snowflake bridge. After usage passed a certain threshold, we started getting constant EADDRNOTAVAIL, not on the outgoing connections to middle nodes, but on the many localhost TCP connections used by the pluggable transports model.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

Long story short, the only mitigation that worked for us was to bind sockets to an address (with port number unspecified, and with IP_BIND_ADDRESS_NO_PORT *unset*) before connecting them, and use different 127.0.0.0/8 addresses or ranges of addresses in different segments of the communication chain.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

https://gitlab.torproject.org/dcf/extor-static-cookie/-/commit/a5c7a038a71ae...

IP_BIND_ADDRESS_NO_PORT was mentioned in another part of the thread ( https://lists.torproject.org/pipermail/tor-relays/2022-December/020895.html ). For us, this bind option *did not help* and in fact we had to apply a workaround for Haproxy, which has IP_BIND_ADDRESS_NO_PORT hardcoded. *Why* that should be the case is a mystery to me, as is why it is true that bind-before-connect avoids EADDRNOTAVAIL even when the address manually bound to is the very same address the kernel would have automatically assigned. I even spent some time reading the Linux 5.10 source code trying to make sense of it. In the source code I found, or at least think I found, code paths for the behvior I observed; but the behavior seems to go against how bind and IP_BIND_ADDRESS_NO_PORT are documented to work.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

...
Although my understanding of what Linux is doing is very imperfect, my understanding is that both of these questions have the same answer: port number assignment in `connect` when called on a socket not yet bound to a port works differently than in `bind` when called with a port number of 0. In case (1), the socket is not bound to a port because you haven't even called `bind`. In case (2), the socket is not bound to a port because haproxy sets the `IP_BIND_ADDRESS_NO_PORT` sockopt before calling `bind`. When you call `bind` *without* `IP_BIND_ADDRESS_NO_PORT`, it causes the port number to be bound before calling `connect`, which avoids the code path in `connect` that results in `EADDRNOTAVAIL`.

I am confused by these results, which are contrary to my understanding of what `IP_BIND_ADDRESS_NO_PORT` is supposed to do, which is precisely to avoid the problem of source address port exhaustion by deferring the port number assignment until the time of `connect`, when additional information about the destination address is available. But it's demonstrable that binding to a source port before calling `connect` avoids `EADDRNOTAVAIL` errors in our use cases, whatever the cause may be.

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Anders Trier Olesen

4:06 p.m.

Also see this patch, which introduces net.ipv4.ip_autobind_reuse: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... Enabling net.ipv4.ip_autobind_reuse allows the kernel to bind SO_REUSEADDR enabled sockets (which I think they are in tor) to the same <addr, port> only when all ephemeral ports are exhausted. (So it should fix the "resource exhausted" bugs, but we'll still spend way too much time in the kernel looking for free ports, before giving up and checking if net.ipv4.ip_autobind_reuse is toggled) It is only safe to use when you know that you'll not have tons of connections to the same <dstip:dstport> (so not safe to use in the haproxy setup), which is why it "should only be set by experts", and they suggest using IP_BIND_ADDRESS_NO_PORT instead:

...

ip_autobind_reuse - BOOLEAN By default, bind() does not select the ports automatically even if the new socket and all sockets bound to the port have SO_REUSEADDR. ip_autobind_reuse allows bind() to reuse the port and this is useful when you use bind()+connect(), but may break some applications. The preferred solution is to use IP_BIND_ADDRESS_NO_PORT and this option (i.e ip_autobind_reuse) should only be set by experts. Default: 0

I've enabled `sysctl -w net.ipv4.ip_autobind_reuse=1` on the dotsrc exits for now, while we wait for IP_BIND_ADDRESS_NO_PORT. - Anders On Sat, Dec 10, 2022 at 9:59 AM Anders Trier Olesen < anders.trier.olesen@gmail.com> wrote:

...

Hi David

IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your Haproxy setup, because all the connections are to the same dst tuple <ip, port> (i.e 127.0.0.1:ExtORPort). The connect() system call is looking for a unique 5-tuple <protocol, srcip, srcport, dstip, dstport>. In the Haproxy setup, the only free variable is srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling IP_BIND_ADDRESS_NO_PORT makes no difference.

The following should help (unless found a bug in Linux):

1. Let tor listen on a bunch of different ExtORPort 2. Let tor listen on a bunch of ips for the ExtORPort (so we have #ExtORPort * #ExtOrPortListenIPs unique combinations) 3. Connect from different src ips (what you already implemented) 4. sysctl -w net.ipv4.ip_local_port_range="1024 65535"

For 1 and 2 to make a difference, if you do a 3 (i.e bind before connect), you need IP_BIND_ADDRESS_NO_PORT enabled on the socket.

Tor relays already connect to many different dstip:dstport pairs, so enabling IP_BIND_ADDRESS_NO_PORT should solve our problem.

I rest my case ;)

Best regards Anders Trier Olesen

On Sat, Dec 10, 2022 at 5:41 AM David Fifield <david@bamsoftware.com> wrote:

...
On Fri, Dec 09, 2022 at 09:47:07AM +0000, Alexander Færøy wrote:

...
On 2022/12/01 20:35, Christopher Sheats wrote:

...
Does anyone have experience troubleshooting and/or fixing this problem?

Like I wrote in [1], I think it would be interesting to hear if the patch from pseudonymisaTor in ticket #26646[2] would be of any help in the given situation. The patch allows an exit operator to specify a range of IP addresses for binding purposes for outbound connections. I would think this could split the load wasted on trying to resolve port conflicts in the kernel amongst the set of IP's you have available for outbound connections.

This sounds similar to a problem we faced with the main Snowflake bridge. After usage passed a certain threshold, we started getting constant EADDRNOTAVAIL, not on the outgoing connections to middle nodes, but on the many localhost TCP connections used by the pluggable transports model.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

Long story short, the only mitigation that worked for us was to bind sockets to an address (with port number unspecified, and with IP_BIND_ADDRESS_NO_PORT *unset*) before connecting them, and use different 127.0.0.0/8 addresses or ranges of addresses in different segments of the communication chain.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

https://gitlab.torproject.org/dcf/extor-static-cookie/-/commit/a5c7a038a71ae...

IP_BIND_ADDRESS_NO_PORT was mentioned in another part of the thread ( https://lists.torproject.org/pipermail/tor-relays/2022-December/020895.html ). For us, this bind option *did not help* and in fact we had to apply a workaround for Haproxy, which has IP_BIND_ADDRESS_NO_PORT hardcoded. *Why* that should be the case is a mystery to me, as is why it is true that bind-before-connect avoids EADDRNOTAVAIL even when the address manually bound to is the very same address the kernel would have automatically assigned. I even spent some time reading the Linux 5.10 source code trying to make sense of it. In the source code I found, or at least think I found, code paths for the behvior I observed; but the behavior seems to go against how bind and IP_BIND_ADDRESS_NO_PORT are documented to work.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...

...
Although my understanding of what Linux is doing is very imperfect, my understanding is that both of these questions have the same answer: port number assignment in `connect` when called on a socket not yet bound to a port works differently than in `bind` when called with a port number of 0. In case (1), the socket is not bound to a port because you haven't even called `bind`. In case (2), the socket is not bound to a port because haproxy sets the `IP_BIND_ADDRESS_NO_PORT` sockopt before calling `bind`. When you call `bind` *without* `IP_BIND_ADDRESS_NO_PORT`, it causes the port number to be bound before calling `connect`, which avoids the code path in `connect` that results in `EADDRNOTAVAIL`.

I am confused by these results, which are contrary to my understanding of what `IP_BIND_ADDRESS_NO_PORT` is supposed to do, which is precisely to avoid the problem of source address port exhaustion by deferring the port number assignment until the time of `connect`, when additional information about the destination address is available. But it's demonstrable that binding to a source port before calling `connect` avoids `EADDRNOTAVAIL` errors in our use cases, whatever the cause may be.

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

David Fifield

9:23 p.m.

On Sat, Dec 10, 2022 at 09:59:14AM +0100, Anders Trier Olesen wrote:

...

IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your Haproxy setup, because all the connections are to the same dst tuple <ip, port> (i.e 127.0.0.1:ExtORPort). The connect() system call is looking for a unique 5-tuple <protocol, srcip, srcport, dstip, dstport>. In the Haproxy setup, the only free variable is srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling IP_BIND_ADDRESS_NO_PORT makes no difference.

No—that is what I thought too, at first, but experimentally it is not the case. Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and *doing nothing else* is sufficient to resolve the problem. Haproxy ends up binding to the same address it would have bound to with IP_BIND_ADDRESS_NO_PORT, and there are the same number of 5-tuples to the same endpoints, but the EADDRNOTAVAIL errors stop. It is counterintuitive and unexpected, which why I took the trouble to write it up. As I wrote at #40201, there are divergent code paths for connect in the kernel when the port is already bound versus when it is not bound. It's not as simple as filling in blanks in a 5-tuple in otherwise identical code paths. Anyway, it is not true that all connections go to the same (IP, port). (There would be no need to use a load balancer if that were the case.) At the time, we were running 12 tor processes with 12 different ExtORPorts (each ExtORPort on a different IP address, even: 127.0.3.1, 127.0.3.2, etc.). We started to have EADDRNOTAVAIL problems at around 3000 connections per ExtORPort, which is far too few to have exhausted the 5-tuple space. Please check the discussion at #40201 again, because I documented this detail there. I urge you to run an experient yourself, if these observations are not what you expect. I was surprised, as well.

Anders Trier Olesen

11 Dec 11 Dec

12:15 a.m.

...

I urge you to run an experient yourself, if these observations are not what you expect. I was surprised, as well. Very interesting. I'll run some tests.

We do agree that IP_BIND_ADDRESS_NO_PORT should fix OPs' problem, right? With it enabled, there's no path to inet_csk_bind_conflict which is where OPs CPU spend too much time. - Anders On Sat, Dec 10, 2022 at 4:23 PM David Fifield <david@bamsoftware.com> wrote:

...

On Sat, Dec 10, 2022 at 09:59:14AM +0100, Anders Trier Olesen wrote:

...
IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your Haproxy setup, because all the connections are to the same dst tuple <ip, port> (i.e 127.0.0.1:ExtORPort). The connect() system call is looking for a unique 5-tuple <protocol, srcip, srcport, dstip, dstport>. In the Haproxy setup, the only free variable is srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling IP_BIND_ADDRESS_NO_PORT makes no difference.

No—that is what I thought too, at first, but experimentally it is not the case. Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and *doing nothing else* is sufficient to resolve the problem. Haproxy ends up binding to the same address it would have bound to with IP_BIND_ADDRESS_NO_PORT, and there are the same number of 5-tuples to the same endpoints, but the EADDRNOTAVAIL errors stop. It is counterintuitive and unexpected, which why I took the trouble to write it up.

As I wrote at #40201, there are divergent code paths for connect in the kernel when the port is already bound versus when it is not bound. It's not as simple as filling in blanks in a 5-tuple in otherwise identical code paths.

Anyway, it is not true that all connections go to the same (IP, port). (There would be no need to use a load balancer if that were the case.) At the time, we were running 12 tor processes with 12 different ExtORPorts (each ExtORPort on a different IP address, even: 127.0.3.1, 127.0.3.2, etc.). We started to have EADDRNOTAVAIL problems at around 3000 connections per ExtORPort, which is far too few to have exhausted the 5-tuple space. Please check the discussion at #40201 again, because I documented this detail there.

I urge you to run an experient yourself, if these observations are not what you expect. I was surprised, as well. _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Anders Trier Olesen

12 Dec 12 Dec

5:39 a.m.

I wrote some tests[1] which showed behaviour I did not expect. IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind without it enabled turns out to be even worse than I thought. This is what I think is happening: A successful bind() on a socket without IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port configured, makes the assigned (or supplied) port unavailable for new connect()s (on different sockets), no matter the destination. I.e if you exhaust the entire net.ipv4.ip_local_port_range with bind() (no matter what IP you bind to!), connect() will stop working - no matter what IP you attempt to connect to. You can work around this by manually doing a bind() (with or without an explicit port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before connect(). $ uname -a Linux laptop 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux # sysctl -w net.ipv4.ip_local_port_range="40000 40100" $ cd server && cargo run & Version used: https://github.com/AndersTrier/IP_BIND_ADDRESS_NO_PORT_tests/blob/e74b09f680... $ ../connect.py Raised RLIMIT_NOFILE softlimit from 1024 to 200000 Select test (1-6): 2 #### Test 2 #### Error on bind: [Errno 98] Address already in use Made 101 connections. Expected to be around 101. Select test (1-6): 1 #### Test 1 #### Error on connect: [Errno 99] Cannot assign requested address Made 0 connections. Expected to be around 101. Select test (1-6): 3 #### Test 3 #### Error on bind: [Errno 98] Address already in use Made 200 connections. Expected to be around 202. What blows my mind is that after running test2, you cannot connect to anything without manually doing a bind() beforehand (as shown by test1 and test3 above)! This also means that after running test2, software like ssh stops working: $ ssh -v mirrors.dotsrc.org [...] debug1: connect to address 130.225.254.116 port 22: Cannot assign requested address When using IP_BIND_ADDRESS_NO_PORT, we don't have this problem (1 5 6 can be run in any order): $ ./connect.py Raised RLIMIT_NOFILE softlimit from 1024 to 200000 Select test (1-6): 5 #### Test 5 #### Error on connect: [Errno 99] Cannot assign requested address Made 90 connections. Expected to be around 101. Select test (1-6): 6 #### Test 6 #### Error on connect: [Errno 99] Cannot assign requested address Made 180 connections. Expected to be around 202. Select test (1-6): 1 #### Test 1 #### Error on connect: [Errno 99] Cannot assign requested address Made 90 connections. Expected to be around 101.

...

Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and *doing nothing else* is sufficient to resolve the problem. Maybe there are other processes on the same host which calls bind() without IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or similar in torrc?

[1] https://github.com/AndersTrier/IP_BIND_ADDRESS_NO_PORT_tests On Sat, Dec 10, 2022 at 7:15 PM Anders Trier Olesen < anders.trier.olesen@gmail.com> wrote:

...

...
I urge you to run an experient yourself, if these observations are not what you expect. I was surprised, as well. Very interesting. I'll run some tests.

We do agree that IP_BIND_ADDRESS_NO_PORT should fix OPs' problem, right? With it enabled, there's no path to inet_csk_bind_conflict which is where OPs CPU spend too much time.

- Anders

On Sat, Dec 10, 2022 at 4:23 PM David Fifield <david@bamsoftware.com> wrote:

...
On Sat, Dec 10, 2022 at 09:59:14AM +0100, Anders Trier Olesen wrote:

...
IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your Haproxy setup, because all the connections are to the same dst tuple <ip, port> (i.e 127.0.0.1:ExtORPort). The connect() system call is looking for a unique 5-tuple <protocol, srcip, srcport, dstip, dstport>. In the Haproxy setup, the only free variable is srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling IP_BIND_ADDRESS_NO_PORT makes no difference.

No—that is what I thought too, at first, but experimentally it is not the case. Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and *doing nothing else* is sufficient to resolve the problem. Haproxy ends up binding to the same address it would have bound to with IP_BIND_ADDRESS_NO_PORT, and there are the same number of 5-tuples to the same endpoints, but the EADDRNOTAVAIL errors stop. It is counterintuitive and unexpected, which why I took the trouble to write it up.

As I wrote at #40201, there are divergent code paths for connect in the kernel when the port is already bound versus when it is not bound. It's not as simple as filling in blanks in a 5-tuple in otherwise identical code paths.

Anyway, it is not true that all connections go to the same (IP, port). (There would be no need to use a load balancer if that were the case.) At the time, we were running 12 tor processes with 12 different ExtORPorts (each ExtORPort on a different IP address, even: 127.0.3.1, 127.0.3.2, etc.). We started to have EADDRNOTAVAIL problems at around 3000 connections per ExtORPort, which is far too few to have exhausted the 5-tuple space. Please check the discussion at #40201 again, because I documented this detail there.

I urge you to run an experient yourself, if these observations are not what you expect. I was surprised, as well. _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

David Fifield

9:15 p.m.

On Mon, Dec 12, 2022 at 12:39:50AM +0100, Anders Trier Olesen wrote:

...

I wrote some tests[1] which showed behaviour I did not expect. IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind without it enabled turns out to be even worse than I thought. This is what I think is happening: A successful bind() on a socket without IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port configured, makes the assigned (or supplied) port unavailable for new connect()s (on different sockets), no matter the destination. I.e if you exhaust the entire net.ipv4.ip_local_port_range with bind() (no matter what IP you bind to!), connect() will stop working - no matter what IP you attempt to connect to. You can work around this by manually doing a bind() (with or without an explicit port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before connect().

What blows my mind is that after running test2, you cannot connect to anything without manually doing a bind() beforehand (as shown by test1 and test3 above)! This also means that after running test2, software like ssh stops working:

When using IP_BIND_ADDRESS_NO_PORT, we don't have this problem (1 5 6 can be run in any order):

Thank you for preparing that experiment. It's really valuable, and it looks a lot like what I was seeing on the Snowflake bridge: calls to connect would fail with EADDRNOTAVAIL unless first bound concretely to a port number. IP_BIND_ADDRESS_NO_PORT causes bind not to set a concrete port number, so in that respect it's the same as calling connect without calling bind first. It is surprising, isn't it? It certainly feels like calling connect without first binding to an address should have the same effect as manually binding to an address and then calling connect, especially if the address you bind to is the same as the kernel would have chosen automatically. It seems like it might be a bug, but I'm not qualified to judge that. If I am interpreting your results correctly, it means that either of the two extremes is safe: either everything that needs to bind to a source address should call bind with IP_BIND_ADDRESS_NO_PORT, or else everything (whether it needs a specific source address or not) should call bind *without* IP_BIND_ADDRESS_NO_PORT. (The latter situation is what we've arrived at on the Snowflake bridge.) The middle ground, where some connections use IP_BIND_ADDRESS_NO_PORT and some do not, is what causes trouble, because connections that do not use IP_BIND_ADDRESS_NO_PORT somehow "poison" the ephemeral port pool for connections that do use IP_BIND_ADDRESS_NO_PORT (and for connections that do not bind at all). It would explain why causing HAProxy not to use IP_BIND_ADDRESS_NO_PORT resolved errors in my case.

...

...
Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and *doing nothing else* is sufficient to resolve the problem.

Maybe there are other processes on the same host which calls bind() without IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or similar in torrc?

OutboundBindAddress is a likely culprit. We did end up setting OutboundBindAddress on the bridge during the period of intense performance debugging at the end of September. One thing doesn't quite add up, though. The earliest EADDRNOTAVAIL log messages started at 2022-09-28 10:57:26: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... Whereas according to the change history of /etc on the bridge, OutboundBindAddress was first set some time between 2022-09-29 21:38:37 and 2022-09-29 22:37:06, over 30 hours later. I would be tempted to say this is a case of what you initially suspected, simple tuple exhaustion between two static IP addresses, if not for the fact that pre-binding an address resolved the problem in that case as well ("I get EADDRNOTAVAIL sometimes even with netcat, making a connection to the haproxy port—but not if I specify a source address in netcat"). But I only ran that netcat test after OutboundBindAddress had been set, so there may have been many factors being conflated. Anyway, thank your for the insight. I apologize if I was inconsiderate in my prior reply.

Anders Trier Olesen

13 Dec 13 Dec

3:18 a.m.

...

It is surprising, isn't it? It certainly feels like calling connect without first binding to an address should have the same effect as manually binding to an address and then calling connect, especially if the address you bind to is the same as the kernel would have chosen automatically. It seems like it might be a bug, but I'm not qualified to judge that. Yes, I'm starting to think so too. And strange that Cloudflare doesn't mention stumbling upon this problem in their blogpost on running out of ephemeral ports. [1] If I find the time, I'll make an attempt at understanding exactly what is going on in the kernel.

...

If I am interpreting your results correctly, it means that either of the two extremes is safe Yes. That is what I think too.

...

Anyway, thank your for the insight. I apologize if I was inconsiderate in my prior reply. Likewise!

Best regards Anders Trier Olesen [1] https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-s... On Mon, Dec 12, 2022 at 4:16 PM David Fifield <david@bamsoftware.com> wrote:

...

On Mon, Dec 12, 2022 at 12:39:50AM +0100, Anders Trier Olesen wrote:

...
I wrote some tests[1] which showed behaviour I did not expect. IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind without it enabled turns out to be even worse than I thought. This is what I think is happening: A successful bind() on a socket without IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port configured, makes the assigned (or supplied) port unavailable for new connect()s (on different sockets), no matter the destination. I.e if you exhaust the entire net.ipv4.ip_local_port_range with bind() (no matter what IP you bind to!), connect() will stop working - no matter what IP you attempt to connect to. You can work around this by manually doing a bind() (with or without an explicit port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before connect().

What blows my mind is that after running test2, you cannot connect to anything without manually doing a bind() beforehand (as shown by test1 and test3 above)! This also means that after running test2, software like ssh stops working:

When using IP_BIND_ADDRESS_NO_PORT, we don't have this problem (1 5 6 can be run in any order):

Thank you for preparing that experiment. It's really valuable, and it looks a lot like what I was seeing on the Snowflake bridge: calls to connect would fail with EADDRNOTAVAIL unless first bound concretely to a port number. IP_BIND_ADDRESS_NO_PORT causes bind not to set a concrete port number, so in that respect it's the same as calling connect without calling bind first.

It is surprising, isn't it? It certainly feels like calling connect without first binding to an address should have the same effect as manually binding to an address and then calling connect, especially if the address you bind to is the same as the kernel would have chosen automatically. It seems like it might be a bug, but I'm not qualified to judge that.

If I am interpreting your results correctly, it means that either of the two extremes is safe: either everything that needs to bind to a source address should call bind with IP_BIND_ADDRESS_NO_PORT, or else everything (whether it needs a specific source address or not) should call bind *without* IP_BIND_ADDRESS_NO_PORT. (The latter situation is what we've arrived at on the Snowflake bridge.) The middle ground, where some connections use IP_BIND_ADDRESS_NO_PORT and some do not, is what causes trouble, because connections that do not use IP_BIND_ADDRESS_NO_PORT somehow "poison" the ephemeral port pool for connections that do use IP_BIND_ADDRESS_NO_PORT (and for connections that do not bind at all). It would explain why causing HAProxy not to use IP_BIND_ADDRESS_NO_PORT resolved errors in my case.

...
...
Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and *doing nothing else* is sufficient to resolve the problem.

Maybe there are other processes on the same host which calls bind() without IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or similar in torrc?

OutboundBindAddress is a likely culprit. We did end up setting OutboundBindAddress on the bridge during the period of intense performance debugging at the end of September.

One thing doesn't quite add up, though. The earliest EADDRNOTAVAIL log messages started at 2022-09-28 10:57:26:

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... Whereas according to the change history of /etc on the bridge, OutboundBindAddress was first set some time between 2022-09-29 21:38:37 and 2022-09-29 22:37:06, over 30 hours later. I would be tempted to say this is a case of what you initially suspected, simple tuple exhaustion between two static IP addresses, if not for the fact that pre-binding an address resolved the problem in that case as well ("I get EADDRNOTAVAIL sometimes even with netcat, making a connection to the haproxy port—but not if I specify a source address in netcat"). But I only ran that netcat test after OutboundBindAddress had been set, so there may have been many factors being conflated.

Anyway, thank your for the insight. I apologize if I was inconsiderate in my prior reply. _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Christopher Sheats

30 Dec 30 Dec

2:04 a.m.

I am happy to report that we have upgraded all our relays to Tor 0.4.8.0-alpha-dev and for the pst 8 days since the upgrade the bind conflict has ceased. No firewall rules are being used. No sysctl settings helped. -- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/

...

On Dec 12, 2022, at 1:18 PM, Anders Trier Olesen <anders.trier.olesen@gmail.com> wrote:

...
It is surprising, isn't it? It certainly feels like calling connect without first binding to an address should have the same effect as manually binding to an address and then calling connect, especially if the address you bind to is the same as the kernel would have chosen automatically. It seems like it might be a bug, but I'm not qualified to judge that. Yes, I'm starting to think so too. And strange that Cloudflare doesn't mention stumbling upon this problem in their blogpost on running out of ephemeral ports. [1] If I find the time, I'll make an attempt at understanding exactly what is going on in the kernel.

...
If I am interpreting your results correctly, it means that either of the two extremes is safe Yes. That is what I think too.

...
Anyway, thank your for the insight. I apologize if I was inconsiderate in my prior reply. Likewise!

Best regards Anders Trier Olesen

[1] https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-s...

On Mon, Dec 12, 2022 at 4:16 PM David Fifield <david@bamsoftware.com <mailto:david@bamsoftware.com>> wrote:

...
On Mon, Dec 12, 2022 at 12:39:50AM +0100, Anders Trier Olesen wrote:

...
I wrote some tests[1] which showed behaviour I did not expect. IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind without it enabled turns out to be even worse than I thought. This is what I think is happening: A successful bind() on a socket without IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port configured, makes the assigned (or supplied) port unavailable for new connect()s (on different sockets), no matter the destination. I.e if you exhaust the entire net.ipv4.ip_local_port_range with bind() (no matter what IP you bind to!), connect() will stop working - no matter what IP you attempt to connect to. You can work around this by manually doing a bind() (with or without an explicit port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before connect().

What blows my mind is that after running test2, you cannot connect to anything without manually doing a bind() beforehand (as shown by test1 and test3 above)! This also means that after running test2, software like ssh stops working:

When using IP_BIND_ADDRESS_NO_PORT, we don't have this problem (1 5 6 can be run in any order):

Thank you for preparing that experiment. It's really valuable, and it looks a lot like what I was seeing on the Snowflake bridge: calls to connect would fail with EADDRNOTAVAIL unless first bound concretely to a port number. IP_BIND_ADDRESS_NO_PORT causes bind not to set a concrete port number, so in that respect it's the same as calling connect without calling bind first.

It is surprising, isn't it? It certainly feels like calling connect without first binding to an address should have the same effect as manually binding to an address and then calling connect, especially if the address you bind to is the same as the kernel would have chosen automatically. It seems like it might be a bug, but I'm not qualified to judge that.

If I am interpreting your results correctly, it means that either of the two extremes is safe: either everything that needs to bind to a source address should call bind with IP_BIND_ADDRESS_NO_PORT, or else everything (whether it needs a specific source address or not) should call bind *without* IP_BIND_ADDRESS_NO_PORT. (The latter situation is what we've arrived at on the Snowflake bridge.) The middle ground, where some connections use IP_BIND_ADDRESS_NO_PORT and some do not, is what causes trouble, because connections that do not use IP_BIND_ADDRESS_NO_PORT somehow "poison" the ephemeral port pool for connections that do use IP_BIND_ADDRESS_NO_PORT (and for connections that do not bind at all). It would explain why causing HAProxy not to use IP_BIND_ADDRESS_NO_PORT resolved errors in my case.

...
...
Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and *doing nothing else* is sufficient to resolve the problem.

Maybe there are other processes on the same host which calls bind() without IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or similar in torrc?

OutboundBindAddress is a likely culprit. We did end up setting OutboundBindAddress on the bridge during the period of intense performance debugging at the end of September.

One thing doesn't quite add up, though. The earliest EADDRNOTAVAIL log messages started at 2022-09-28 10:57:26: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... Whereas according to the change history of /etc on the bridge, OutboundBindAddress was first set some time between 2022-09-29 21:38:37 and 2022-09-29 22:37:06, over 30 hours later. I would be tempted to say this is a case of what you initially suspected, simple tuple exhaustion between two static IP addresses, if not for the fact that pre-binding an address resolved the problem in that case as well ("I get EADDRNOTAVAIL sometimes even with netcat, making a connection to the haproxy port—but not if I specify a source address in netcat"). But I only ran that netcat test after OutboundBindAddress had been set, so there may have been many factors being conflated.

Anyway, thank your for the insight. I apologize if I was inconsiderate in my prior reply. _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org <mailto:tor-relays@lists.torproject.org> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

David Fifield

21 Mar 21 Mar

7:27 a.m.

On Mon, Dec 12, 2022 at 10:18:53PM +0100, Anders Trier Olesen wrote:

...

...
It is surprising, isn't it? It certainly feels like calling connect without first binding to an address should have the same effect as manually binding to an address and then calling connect, especially if the address you bind to is the same as the kernel would have chosen automatically. It seems like it might be a bug, but I'm not qualified to judge that.

Yes, I'm starting to think so too. And strange that Cloudflare doesn't mention stumbling upon this problem in their blogpost on running out of ephemeral ports. [1] [1]https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-s... If I find the time, I'll make an attempt at understanding exactly what is going on in the kernel.

Cloudflare has another blog post today that gets into this topic. https://blog.cloudflare.com/the-quantum-state-of-a-tcp-port/ It investigates the difference in behavior between inet_csk_bind_conflict and __inet_hash_connect that I commented on at https://forum.torproject.net/t/tor-relays-inet-csk-bind-conflict/5757/13 and https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla.... Setting the IP_BIND_ADDRESS_NO_PORT option leads to __inet_hash_connect; not setting it leads to inet_csk_bind_conflict. The author attributes the difference in behavior to the fastreuse field in the bind hash bucket:

...

The bucket might already exist or we might have to create it first. But once it exists, its fastreuse field is in one of three possible states: -1, 0, or +1.

…

…inet_csk_get_port() skips conflict check for fastreuse == 1 buckets. …__inet_hash_connect() skips buckets with fastreuse != -1.

938

Age (days ago)

1048

Last active (days ago)

List overview

20 comments

6 participants

participants (6)

Alexander Færøy
Anders Trier Olesen
Chris
Christopher Sheats
David Fifield
lists＠for-privacy.net

inet_csk_bind_conflict

tags

participants (6)