On Wed, Sep 28, 2022 at 11:31:05AM +0200, Linus Nordberg wrote:
David Fifield david@bamsoftware.com wrote Tue, 27 Sep 2022 14:40:48 -0600:
On Tue, Sep 27, 2022 at 08:22:21PM +0200, Linus Nordberg wrote:
David Fifield david@bamsoftware.com wrote Tue, 27 Sep 2022 08:54:53 -0600:
I checked the number of sockets connected to the haproxy frontend port, thinking that we may be running out of localhost 4-tuples. It's still in bounds (but we may have to figure something out for that eventually).
# ss -n | grep -c '127.0.0.1:10000\s*$' 27314 # sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 15000 64000
Would more IP addresses and DNS round robin work?
By more IP addresses you mean more localhost IP addresses, I guess?
My confusion was strong at that time yesterday. I mixed up 4-tuples on our (only) externally reachable address with 4-tuples on localhost addresses. Please ignore and thanks for clarifying.
Getting rid of extor should lower the need for localhost 4-tuples, shouldn't it?
No, not really. The problem is not the total number of 127.0.0.1 four-tuples in use — there are ≈2^32 of those — it's when one end has a fixed port number. The bottleneck in this case is the link between snowflake-server and haproxy (see diagram): https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid...
haproxy binds to 127.0.0.1:10000 and snowflake-proxy connects to haproxy from 127.0.0.1 and an ephemeral port, so three of the four elements of the four-tuple are fixed, permitting only ≈2^16 different tuples:
(127.0.0.1, X, 127.0.0.1, 10000)
The whole pluggable transports interface is built around this model of localhost TCP sockets; I think it did not anticipate scale like this. snowflake-server gets the address 127.0.0.1:10000 from an environment variable; see in /etc/systemd/system/snowflake-server.service:
Environment=TOR_PT_EXTENDED_SERVER_PORT=127.0.0.1:10000
When snowflake-server does pt.DialOr, it's the above address that it makes a TCP connection to. https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
snowflake-server *thinks* it is talking to an upstream tor process's ExtORPort at that address, when actually the connection is intermediated by haproxy (because a single tor process can only handle a limited amount of traffic) and extor-static-cookie (because each tor instance uses a different random authentication key).
haproxy, of course, can listen on multiple ports on its frontend, but TOR_PT_EXTENDED_SERVER_PORT is specified to contain only a single address: https://gitweb.torproject.org/torspec.git/tree/pt-spec.txt?id=ec77ae643f3e47...
That said, none of the above prevents us from hacking around the pluggable transports model where it is constraining. We can free up four-tuple space by varying any of the four elements in the example above; or by using something other than TCP sockets for one or more localhost links. For example, we could hack pt.DialOr to use a random source address in the 127.0.0.0/8 range; that would give us an additional factor of 2^24 between snowflake-server and haproxy. Or we could replace that link with a Unix domain socket. It would just require an alternative means of passing the socket address into snowflake-server, because TOR_PT_EXTENDED_SERVER_PORT cannot represent such an address, and a different version of the pt.DialOr function that does not have the assumption of TCP baked in. https://pkg.go.dev/git.torproject.org/pluggable-transports/goptlib.git#DialO... https://gitweb.torproject.org/pluggable-transports/goptlib.git/tree/pt.go?h=...
Removing extor-static-cookie from the chain would not have an effect on the need for four-tuples, since each of them uses a distinct port number and only has 1/12 of the connections of the bottleneck link.