Linus and I have been putting in a lot of time on the Snowflake bridge over the past week or so to improve performance. The increase in users after the blocking of Tor in Russia last September (the one that led us to the multi-tor architecture) was large, but this recent increase is many times larger. We've cleared out the major bottlenecks and since two days ago days the bridge is finally meeting the additional demand. But it's close: during the busiest times of day the CPU and RAM resources are nearly 100% consumed, and the need for horizontal scaling still exists.
There have been a lot of changes, both major and minor. You can find listings at: https://gitlab.torproject.org/tpo/network-health/metrics/timeline/-/blob/f55... https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
The most important optimizations have been: - Reduce allocations in WebSocket reads and writes. - Use more than one KCP state machine. - Conserve ephemeral ports in localhost communication. - Disable nftables connection tracking.
I attached a graph of bandwidth on the bridge. You can see that between 24 Sep and 02 Oct the daily peaks were unnaturally flattened. The daily shutdowns in Iran caused a paradoxical *increase* in bandwidth, probably because it relieved congestion in the system for the fewer remaining users. In the past two days the shape of the graph looks more natural, and a shutdown decreases bandwidth as you would expect. At its peak, bandwidth is above 4 Gbps. The daily lows are higher than the highest highs of two weeks ago.
## Reduce allocations in writing packets
The code forreading and writing encapsulated packets from WebSocket connections was doing unnecessary memory allocations. Some implicit, like a 32 KB buffer being created by io.Copy, and some explicit, like the intentional packet copies being made in the QueuePacketConn type. Reducing allocations makes the garbage collector run less often, and there's also a small benefit from reduced buffer copies.
https://gitlab.torproject.org/dcf/snowflake/-/commit/42ea5ff60ce9b3a0dff305b... https://gitlab.torproject.org/dcf/snowflake/-/commit/4d4fad30c429bba9062d1b6... https://gitlab.torproject.org/dcf/snowflake/-/commit/57c9aa3477513daf4334763...
## Use more than one KCP state machine
Though most part of snowflake-server are multi-threaded and can scale across many CPUs, the central KCP packet scheduler was limited to one CPU. Because we have a session identity (the client ID) separate from any KCP-specific identity, it's not hard to partition client packets across separate KCP instances by a hash of the client ID. Expanding the number of KCPs from 1 to 2 was enough to relive this bottleneck.
https://gitlab.torproject.org/dcf/snowflake/-/commit/17dc8cad8299eae76c40197...
## Conserve ephemeral ports in localhost communication
The pluggable transports model relies heavily on localhost TCP sockets. The number of users had increased enough that it was sometimes exhausting the range of port numbers usable for distinct localhost 4-tuples. The kernel's errno for this situation is EADDRNOTAVAIL when you try to connect; it manifests variously in different programs as "cannot assign requested address" or "no free ports," generally leading to a terminated connection. We mitigated this problem by having different programs use different localhost source IP addresses (e.g. 127.0.1.0/24, 127.0.2.0/24, etc.) to expand the space of distinct 4-tuples. In a couple of cases this was done in a hacky way that will need to be revisited, by hardcoding a source address range in the source code.
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... https://gitlab.torproject.org/dcf/snowflake/-/commit/d8183ff9680ac9f92e888a5... https://gitlab.torproject.org/dcf/extor-static-cookie/-/commit/0d9078d6aad87...
## Disable nftables connection tracking
We took care of it before it became a problem, but we found it necessary to disable connection tracking in the firewall. The number of tracked connections was getting close to the limit, and past the limit, packets just get dropped.
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...