David Fifield david@bamsoftware.com wrote Tue, 27 Sep 2022 08:54:53 -0600:
On Mon, Sep 26, 2022 at 10:39:42AM +0200, Linus Nordberg via anti-censorship-team wrote:
It seems likely that we're hitting a limit of some sort and next thing is to figure out if it's a soft limit that we can influence through system configuration or if it's a hardware resource limit.
tor has a default bandwidth limit, but we should be nowhere close to it, especially disitributed across 12 instances:
BandwidthRate N bytes|KBytes|MBytes|GBytes|TBytes|KBits|MBits|GBits|TBits
A token bucket limits the average incoming bandwidth usage on this node to the specified number of bytes per second, and the average outgoing bandwidth usage to that same value. If you want to run a relay in the public network, this needs to be at the very least 75 KBytes for a relay (that is, 600 kbits) or 50 KBytes for a bridge (400 kbits) — but of course, more is better; we recommend at least 250 KBytes (2 mbits) if possible. (Default: 1 GByte)
I do not see any rate limit enabled in /etc/haproxy/haproxy.cfg.
I checked the number of sockets connected to the haproxy frontend port, thinking that we may be running out of localhost 4-tuples. It's still in bounds (but we may have to figure something out for that eventually).
# ss -n | grep -c '127.0.0.1:10000\s*$' 27314 # sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 15000 64000
Would more IP addresses and DNS round robin work?
According to https://stackoverflow.com/a/3923785, some other parameters that may be important are
# sysctl net.ipv4.tcp_fin_timeout net.ipv4.tcp_fin_timeout = 60 # cat /proc/sys/net/netfilter/nf_conntrack_max 262144
Yes, we'd better keep an eye on the conntrack count and either raise the max or get rid of the connection tracking somehow. I've seen warnings from netdata about the count raising above 85%.
# cat /proc/sys/net/netfilter/nf_conntrack_{count,max} 181053 262144
One thing I would like to do soon is to hook up the other NIC and put sshd and wireguard on that while keeping snowflake traffic on the current 10G. That way we could start playing with ethtool to instruct the NIC to do some fancy stuff suggested by anarcat (see below).
I've created https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... to track this.
# sysctl net.core.netdev_max_backlog net.core.netdev_max_backlog = 1000 Ethernet txqueuelen (1000)
net.core.netdev_max_backlog is the "maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them." https://www.kernel.org/doc/html/latest/admin-guide/sysctl/net.html#netdev-ma... if we were having trouble with backlog buffer sizes, I would expect to see lots of dropped packets, and I don't:
# ethtool -S eno1 | grep dropped rx_dropped: 0 tx_dropped: 0
Yes, the lack of drops makes me think we should look elsewhere.
It may be something inside snowflake-server, for example some central scheduling algorithm that cannot run any faster. (Though if that were the case, I'd expect to see one CPU core at 100%, which I do not.) I suggest doing another round of profiling now that we have taken care of the more obvious hotspots in https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
After an interesting chat with anarcat I think that we are CPU bound and in particular by handling so many interrupts from the NIC and dealing with such a high number of context switches. I have two suggestions on how to move forward with this.
First, let's patch tor to get rid of the extor processes, as suggested by David earlier when discussing RAM pressure. This should bring down context switches.
Second, once we've got #40186 sorted, do what's suggested in [1] to bring the interrupt frequency down. This should take some load off the CPU's.
[1] https://www.kernel.org/doc/html/v4.20/networking/i40e.html#interrupt-rate-li...