I wrote some tests[1] which showed behaviour I did not expect. IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind without it enabled turns out to be even worse than I thought.
This is what I think is happening: A successful bind() on a socket without IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port configured, makes the assigned (or supplied) port unavailable for new connect()s (on different sockets), no matter the destination. I.e if you exhaust the entire net.ipv4.ip_local_port_range with bind() (no matter what IP you bind to!), connect() will stop working - no matter what IP you attempt to connect to. You can work around this by manually doing a bind() (with or without an explicit port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before connect().

$ uname -a
Linux laptop 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# sysctl -w net.ipv4.ip_local_port_range="40000 40100"
$ cd server && cargo run &
Version used: https://github.com/AndersTrier/IP_BIND_ADDRESS_NO_PORT_tests/blob/e74b09f680bb01a0078fe7e043e786c111103647/connect.py
$ ../connect.py
Raised RLIMIT_NOFILE softlimit from 1024 to 200000
Select test (1-6): 2
#### Test 2 ####
Error on bind: [Errno 98] Address already in use
Made 101 connections. Expected to be around 101.
Select test (1-6): 1
#### Test 1 ####
Error on connect: [Errno 99] Cannot assign requested address
Made 0 connections. Expected to be around 101.
Select test (1-6): 3
#### Test 3 ####
Error on bind: [Errno 98] Address already in use
Made 200 connections. Expected to be around 202.

What blows my mind is that after running test2, you cannot connect to anything without manually doing a bind() beforehand (as shown by test1 and test3 above)! This also means that after running test2, software like ssh stops working:
$ ssh -v mirrors.dotsrc.org
[...]
debug1: connect to address 130.225.254.116 port 22: Cannot assign requested address

When using IP_BIND_ADDRESS_NO_PORT, we don't have this problem (1 5 6 can be run in any order):
$ ./connect.py
Raised RLIMIT_NOFILE softlimit from 1024 to 200000
Select test (1-6): 5
#### Test 5 ####
Error on connect: [Errno 99] Cannot assign requested address
Made 90 connections. Expected to be around 101.
Select test (1-6): 6
#### Test 6 ####
Error on connect: [Errno 99] Cannot assign requested address
Made 180 connections. Expected to be around 202.
Select test (1-6): 1
#### Test 1 ####
Error on connect: [Errno 99] Cannot assign requested address
Made 90 connections. Expected to be around 101.

> Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and
> *doing nothing else* is sufficient to resolve the problem.
Maybe there are other processes on the same host which calls bind() without IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or similar in torrc?

[1] https://github.com/AndersTrier/IP_BIND_ADDRESS_NO_PORT_tests

On Sat, Dec 10, 2022 at 7:15 PM Anders Trier Olesen <anders.trier.olesen@gmail.com> wrote:
> I urge you to run an experient yourself, if these observations are not
> what you expect. I was surprised, as well.
Very interesting. I'll run some tests.

We do agree that IP_BIND_ADDRESS_NO_PORT should fix OPs' problem, right? With it enabled, there's no path to inet_csk_bind_conflict which is where OPs CPU spend too much time.

- Anders

On Sat, Dec 10, 2022 at 4:23 PM David Fifield <david@bamsoftware.com> wrote:
On Sat, Dec 10, 2022 at 09:59:14AM +0100, Anders Trier Olesen wrote:
> IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your
> Haproxy setup, because all the connections are to the same dst tuple <ip, port>
> (i.e 127.0.0.1:ExtORPort).
> The connect() system call is looking for a unique 5-tuple <protocol, srcip,
> srcport, dstip, dstport>. In the Haproxy setup, the only free variable is
> srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling
> IP_BIND_ADDRESS_NO_PORT makes no difference.

No—that is what I thought too, at first, but experimentally it is not
the case. Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and
*doing nothing else* is sufficient to resolve the problem. Haproxy ends
up binding to the same address it would have bound to with
IP_BIND_ADDRESS_NO_PORT, and there are the same number of 5-tuples to
the same endpoints, but the EADDRNOTAVAIL errors stop. It is
counterintuitive and unexpected, which why I took the trouble to write
it up.

As I wrote at #40201, there are divergent code paths for connect in the
kernel when the port is already bound versus when it is not bound. It's
not as simple as filling in blanks in a 5-tuple in otherwise identical
code paths.

Anyway, it is not true that all connections go to the same (IP, port).
(There would be no need to use a load balancer if that were the case.)
At the time, we were running 12 tor processes with 12 different
ExtORPorts (each ExtORPort on a different IP address, even: 127.0.3.1,
127.0.3.2, etc.). We started to have EADDRNOTAVAIL problems at around
3000 connections per ExtORPort, which is far too few to have exhausted
the 5-tuple space. Please check the discussion at #40201 again, because
I documented this detail there.

I urge you to run an experient yourself, if these observations are not
what you expect. I was surprised, as well.
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays