[anti-censorship-team] Improving Snowflake performance by adjusting smux parameters

Thu Jul 1 00:43:03 UTC 2021

While experimenting with another tunnel built on KCP and smux, I
discovered that performance could be greatly increased by increasing the
size of smux buffers. It's likely that doing the same can also improve
performance in Snowflake.

There are two relevant parameters, MaxReceiveBuffer and MaxStreamBuffer.
MaxStreamBuffer seems to be the most important one to increase.

https://pkg.go.dev/github.com/xtaci/smux#Config
	// MaxReceiveBuffer is used to control the maximum
	// number of data in the buffer pool
	MaxReceiveBuffer int

	// MaxStreamBuffer is used to control the maximum
	// number of data per stream
	MaxStreamBuffer int

The default values are 4 MB and 64 KB.
https://github.com/xtaci/smux/blob/eba6ee1d2a14eb7f86f43f7b7cb3e44234e13c66/mux.go#L50-L51
	MaxReceiveBuffer:  4194304,
	MaxStreamBuffer:   65536,

kcptun, a prominent KCP/smux tunnel, has defaults of 4 MB (--smuxbuf)
and 2 MB (--streambuf) in both client and server:
https://github.com/xtaci/kcptun/blob/9a5b31b4706aba4c67bcb6ebfe108fdb564a9053/README.md#usage

In my experiment, I changed the values to 4 MB / 1 MB on the client and
16 MB / 1 MB on the server. This change increased download speed by
about a factor of 3:
	default buffers		 477.4 KB/s
	enlarged buffers	1388.3 KB/2
Values of MaxStreamBuffer higher than 1 MB didn't seem to have much of
an effect. 256 KB did not help as much.

My guess, based on intuition, is that on the server we should set a
large value of MaxReceiveBuffer, as it is a global limit shared among
all clients, and a relatively smaller value of MaxStreamBuffer, because
there are expected to be many simultaneous streams. On the client, don't
set MaxReceiveBuffer too high, because it's on an end-user device, but
go ahead and set MaxStreamBuffer high, because there's expected to be
only one or two streams at a time.

I discovered this initially by temporarily settings smux to protocol v1
instead of v2. My understanding is that v1 lacks some kind of receive
window mechanism that v2 has, and by default is more willing to expend
memory receiving data. See "Per-stream sliding window to control
congestion.(protocol version 2+)":
https://pkg.go.dev/github.com/xtaci/smux#readme-features

Past performance ticket: "Reduce KCP bottlenecks for Snowflake"
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40026