Linus Nordberg and I wrote a short paper that was presented at FOCI 2023. The topic is how to use all the available CPU capacity of a server running a Tor relay.
This is how the Snowflake bridges are set up. It might also be useful for anyone running a relay that is bottleneck on the CPU. If you have ever run multiple relays on one IP address for better scaling (if you are one of the relay operators affected by the recent AuthDirMaxServersPerAddr change), you might want to experiment with this setup. The difference is that all the instances of Tor have the same relay fingerprint, so they operate like one big relay instead of many small relays.
https://www.bamsoftware.com/papers/pt-bridge-hiperf/
The pluggable transports model in Tor separates the concerns of anonymity and circumvention by running circumvention code in a separate process, which exchanges information with the main Tor process over local interprocess communication. This model leads to problems with scaling, especially for transports, like meek and Snowflake, whose blocking resistance does not rely on there being numerous, independently administered bridges, but which rather forward all traffic to one or a few centralized bridges. We identify what bottlenecks arise as a bridge scales from 500 to 10,000 simultaneous users, and then from 10,000 to 50,000, and show ways of overcoming them, based on our experience running a Snowflake bridge. The key idea is running multiple Tor processes in parallel on the bridge host, with externally synchronized identity keys.