[anti-censorship-team] Improving Snowflake performance by adjusting smux parameters

Cecylia Bocovich cohosh at torproject.org
Wed Jul 14 19:20:31 UTC 2021


On 2021-06-30 8:43 p.m., David Fifield wrote:
> While experimenting with another tunnel built on KCP and smux, I
> discovered that performance could be greatly increased by increasing the
> size of smux buffers. It's likely that doing the same can also improve
> performance in Snowflake.
>
> There are two relevant parameters, MaxReceiveBuffer and MaxStreamBuffer.
> MaxStreamBuffer seems to be the most important one to increase.
>
> https://pkg.go.dev/github.com/xtaci/smux#Config
> 	// MaxReceiveBuffer is used to control the maximum
> 	// number of data in the buffer pool
> 	MaxReceiveBuffer int
>
> 	// MaxStreamBuffer is used to control the maximum
> 	// number of data per stream
> 	MaxStreamBuffer int
>
> The default values are 4 MB and 64 KB.
> https://github.com/xtaci/smux/blob/eba6ee1d2a14eb7f86f43f7b7cb3e44234e13c66/mux.go#L50-L51
> 	MaxReceiveBuffer:  4194304,
> 	MaxStreamBuffer:   65536,
>
> kcptun, a prominent KCP/smux tunnel, has defaults of 4 MB (--smuxbuf)
> and 2 MB (--streambuf) in both client and server:
> https://github.com/xtaci/kcptun/blob/9a5b31b4706aba4c67bcb6ebfe108fdb564a9053/README.md#usage
>
> In my experiment, I changed the values to 4 MB / 1 MB on the client and
> 16 MB / 1 MB on the server. This change increased download speed by
> about a factor of 3:
> 	default buffers		 477.4 KB/s
> 	enlarged buffers	1388.3 KB/2
> Values of MaxStreamBuffer higher than 1 MB didn't seem to have much of
> an effect. 256 KB did not help as much.
>
> My guess, based on intuition, is that on the server we should set a
> large value of MaxReceiveBuffer, as it is a global limit shared among
> all clients, and a relatively smaller value of MaxStreamBuffer, because
> there are expected to be many simultaneous streams. On the client, don't
> set MaxReceiveBuffer too high, because it's on an end-user device, but
> go ahead and set MaxStreamBuffer high, because there's expected to be
> only one or two streams at a time.
>
> I discovered this initially by temporarily settings smux to protocol v1
> instead of v2. My understanding is that v1 lacks some kind of receive
> window mechanism that v2 has, and by default is more willing to expend
> memory receiving data. See "Per-stream sliding window to control
> congestion.(protocol version 2+)":
> https://pkg.go.dev/github.com/xtaci/smux#readme-features
>
> Past performance ticket: "Reduce KCP bottlenecks for Snowflake"
> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40026
> _______________________________________________
> anti-censorship-team mailing list
> anti-censorship-team at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/anti-censorship-team

This is a great find!

I dug into the code a little bit to see how these values are used, and
here's a summary of what I found:

MaxReceiveBuffer limits the amount of data read into a buffer for each
smux.Session. Here's the relevant library code:
https://github.com/xtaci/smux/blob/eba6ee1d2a14eb7f86f43f7b7cb3e44234e13c66/session.go#L90
Relevant Snowflake server code:
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/blob/4f7833b3840163f8ca256ada0f8292ed2bdc0ceb/server/lib/snowflake.go#L171
Relevant Snowflake client code:
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/blob/4f7833b3840163f8ca256ada0f8292ed2bdc0ceb/client/lib/snowflake.go#L239

This value is not advertized to the other endpoint in any way and
therefore does not directly affect the amount of in-flight traffic. It's
used to limit the size of a session's buffer which holds data read in
from the underlying connection (in this case a KCP connection), while it
waits for Read to be called on any of its streams.

I think there is a 1:1 relationship between smux.Sessions and KCP
connections, making this also a per-client value and not a global limit.
My intuition is that changing it will improve performance if we're
running into CPU limits and are unable to Read the data out of
smux.Streams quickly enough, resulting in dropped packets and
retransmissions because the in-flight data from the other endpoint is
waiting too long to be read in by the smux.Session. So changing it at
the client might indeed help, but increasing the processing power (CPU)
of the server might also help address the same underlying issue. We
recently doubled the number of CPU cores for the Snowflake server:
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40051

MaxStreamBuffer on the other hand *does* directly limit the amount of
in-flight traffic, because it is sent to the other endpoint in a window
update:
https://github.com/xtaci/smux/blob/eba6ee1d2a14eb7f86f43f7b7cb3e44234e13c66/stream.go#L256
The other endpoint will not send additional data if its calculation of
the amount of inflight data is greater than or equal to this value.

Since client-server connections are abnormally long (the
smux.Stream/Session data is traversing the distance between the client
and the proxy + the distance between the proxy and the server), it makes
sense that increasing the MaxStreamBuffer will improve performance.

It also occurs to me that the kcp.MaxWindowSize has to be at least as
big as the MaxStreamBuffer size to notice any improvements, otherwise
that will be the limiting factor on the amount of inflight data. Right
now this is set to 64KB for both the client and the server.

I starting doing a few quick performance tests by just modifying the
client. This should be enough to check the impact of tuning the
MaxStreamBuffer and MaxReceiveBuffer for download speeds. But, because
the KCP MaxWindowSize is both a send and receive window, as expected I
didn't see any difference in performance without increasing this value
at both the client and server first (my results for each of the test
cases I ran were download speeds of 200-500KB/s).

My proposal is to set the KCP MaxWindowSize to 4MB and smux
MaxStreamBuffer to 1MB at both the client and the server and deploy
these changes. Then, we can try tuning these values at the client side
to test the impact of MaxStreamBuffer sizes up to 4MB for download speeds.

Cecylia





More information about the anti-censorship-team mailing list