Linus Nordberg and I have been working together to run the main Snowflake bridge since April 2022. We are preparing a short paper (4 pages) for the FOCI workshop (https://foci.community/) on the special procedures required to operate a bridge that gets the large volume of traffic that a Snowflake bridge does. This is a draft of the submission:
https://www.bamsoftware.com/papers/pt-bridge-hiperf/pt-bridge-hiperf.2023030... https://www.bamsoftware.com/papers/pt-bridge-hiperf/pt-bridge-hiperf.2023030...
I am posting the draft here to see if anyone has feedback or comments. If you do, our submission deadline is 2023-03-15.
The main ideas were worked out in a thread on this very list: https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-si...
The core problem is that a single tor process is limited to a single CPU core. The main fix is to run multiple tor processes, and synchronize their identity and onion keys externally. It has been essential to enable scaling in the Snowflake bridge; but the same trick may be useful (minus the pluggable transport part) for other relays that process a much larger amount of traffic than average, such as large exit relays. (Though it's not clear that one exit with 2× the capacity is really any different than 2 exits in the same family.)
This is also somewhat related to the current discussion about increasing the limit on the number of relays per IP address, which is a concession to the limited scalability of one tor process:
https://bugs.torproject.org/tpo/core/tor/40744
While this is great in general, it also adds significant complexity for Tor operators since Tor does not scale well on multi-core CPU's. In order to effectively use a modern server with a large amount of threads, you would need to run many Tor relays.