On Wed, Jan 17, 2024 at 01:50:51PM +0100, Linus Nordberg wrote:
--8<---------------cut here---------------start------------->8--- Hey all. In trying to make a 2024 budget for the [Snowflake Operations][] project operating snowflake-01.tpn I need a better understanding of how we direct traffic to the running bridges. Both what potential challenges there are to do it and what the policy for it looks like. The background is that snowflake-01 is close to going full due to CPU consumption. I haven't spotted any flat lines yet but have seen momentary CPU utilisation of 98% a couple of times.
Here are two of the questions I'm looking for an answer to.
- If we get another server, similar to snowflake-01 wrt performance,
will it be useful to the network? Ie will it offload snowflake-01?
- If we do **not** get another server and snowflake-01 goes full, will
users have a bad network experience as a result of this? Can traffic be moved to snowflake-02?
--8<---------------cut here---------------end--------------->8---
The way server selection works is it's (in theory) uniform over all available bridges, and driven by random selection at clients. This is unfortunate: it would be easier and more flexible if we could enforce a certain distribution at the broker, or even if we could implement some weighted distribution at clients (requiring new releases to change). But it's the best we can do given the interface with tor, which requires an a priori relay fingerprint in the bridge line. We have written about it in the paper:
https://github.com/turfed/snowflake-paper/blob/fde6e0f5bec0ac2c59e7085e6ac98... There is another difficulty that is harder to work around. A Tor bridge is identified by a long-term identity public key. If, on connecting to a bridge, the client finds that the bridge's identity is not the expected one, the client will terminate the connection...
We rely on clients choosing uniformly to equalize load across bridges. A consequence is that every bridge must meet a minimum performance standard: we cannot, say, centrally assign 20% of clients to one and 80% to another according to their relative capacity. Another drawback is that there is currently no way to instruct Tor to connect to only one of the bridges it knows about (short of rewriting the configuration file): if two bridges are configured, Tor starts two sessions through Snowflake, each doing its own rendezvous, which is wasteful and makes for a more conspicuous network fingerprint. Still, this is the best solution we have found, given the constraints. A deployment not based on Tor would have more flexibility.
Some past design discussion: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
So with 3 bridges (and assuming the 3 bridge lines are fully distributed to all Snowflake clients; i.e. including Tor Browser and Orbot), then we would expect each bridge to receive 1/3 or traffic. But that raises the question of why does the current snowflake-02 get only about 25% of what snowflake-01 gets? I don't know--for a long time I though it was because snowflake-02 had not been properly released in Orbot, and so a large fraction of clients only knew about the snowflake-01 bridge, but it's been a while and that should no longer be the case. It may have something to do with a more limited network uplink on snowflake-02. That host and uplink, too, are due to be upgraded some time in the coming months, and it's possible we will see some change after that.