[anti-censorship-team] USERADDR for Turbo Tunnel in Snowflake

Thu Feb 6 01:36:07 UTC 2020

On Fri, Jan 31, 2020 at 07:24:48PM -0700, David Fifield wrote:
> https://gitweb.torproject.org/user/dcf/snowflake.git/tree/server/server.go?h=turbotunnel&id=07495371d67f914d2c828bbd3d7facc455996bd2#n135
> The branch currently lacks client geoip lookup (ExtORPort USERADDR),
> because of the difficulty I have talked about before of providing an IP
> address for a virtual session that is not inherently tied to any single
> network connection or address. I have a plan for solving it, though; it
> requires a slight breaking of abstractions. In the server, after reading
> the ClientID, we can peek at the first 4 bytes of the first packet.
> These 4 bytes are the KCP conversation ID (https://github.com/xtaci/kcp-go/blob/v5.5.5/kcp.go#L120),
> a random number chosen by the client, serving roughly the same purpose
> in KCP as our ClientID. We store a temporary mapping from the
> conversation ID to the IP address of client making the WebSocket
> connection. kcp-go provides a GetConv function that we can call in
> handleStream, just as we're about to connect to the ORPort, to look up
> the client's IP address in the mapping. The possibility of doing this is
> one reason I decided to go with KCP for this implementation rather than
> QUIC as I did in the meek implementation: the quic-go package doesn't
> expose an accessor for the QUIC connection ID.

https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=turbotunnel&id=3f88c4f817812bceb2c4cf1882086556abbf42a8
This commit adds USERADDR support for turbotunnel sessions. I found a
nicer way to do it than what I proposed above, that doesn't require
peeking into the packet structure. Instead of using the KCP conversation
ID as the common element linking an IP address and a client session, we
can use the ClientID (the artificial 8-byte value that we tack on at the
beginning of every WebSocket connection). The ServeHTTP function has
access to the ClientID because it's what parses it out, and once you
have a session you can recover the ClientID by calling the RemoteAddr
method—this is an effect of kcp-go using the address returned from its
ReadFrom calls as the remote address of the session, and the fact that
we use the ClientID for the address in those ReadFrom calls.

To summarize:
 * ServeHTTP has an IP address and a ClientID but not a session.
 * acceptStreams has a session and a ClientID but not an IP address.
 * We bridge the gap using a data structure that maps a Client ID to an
   IP address. ServeHTTP stores an entry in the structure, and
   acceptStreams looks it up.

I designed a simple data structure, clientIDMap, to serve as the lookup
table. In spirit it is a map[ClientID]string: you Set(clientID, addr) to
store a mapping, and Get(clientID) to retrieve it. It differs from a
plain map in that it expires old entries when storing new ones: it's a
fixed-size circular buffer. I designed it to be proof against memory
leaks. With a plain map, it would be possible for a client to get as far
as sending a ClientID (storing an entry in the map), but not ultimately
establish a session (leaving the entry in the map forever). But it means
that the buffer has to be large enough not to expire entries before they
are needed (how big depends on the rate of new session creation and
delay between when a WebSocket connection starts and when a session is
established). But even if an entry is expired before it is used, the
worst thing that happens is that one session gets attributed to ??
rather than a certain country. I added a log message for when this
happens and if it turns out to be a problem, we can design a more
complicated, dynamically sized data structure.