https://gitweb.torproject.org/user/dcf/snowflake.git/log/?h=turbotunnel&id=…
These are the elements of a Turbo Tunnel implementation for Snowflake.
Turbo Tunnel is a name for overlaying an abstract, virtual session on
top of concrete, physical network connections, such that the virtual
session is not tied to any particular network connection. In Snowflake,
it solves the problem of migrating a session across multiple WebRTC
connections as temporary proxies come and go. This post is a walkthrough
of the code changes and my design decisions.
== How to try it ==
Download the branch and build it:
git remote add dcf https://git.torproject.org/user/dcf/snowflake.git
git checkout -b turbotunnel --track dcf/turbotunnel
for d in client server broker proxy-go; do (cd $d && go build); done
Run the broker (not changed in this branch):
broker/broker --disable-tls --addr 127.0.0.1:8000
Run a proxy (not changed in this branch):
proxy-go/proxy-go --broker http://127.0.0.1:8000/ --relay ws://127.0.0.1:8080/
Run the server:
tor -f torrc.server
# contents of torrc.server:
DataDirectory datadir-server
SocksPort 0
ORPort 9001
ExtORPort auto
BridgeRelay 1
AssumeReachable 1
PublishServerDescriptor 0
ServerTransportListenAddr snowflake 0.0.0.0:8080
ServerTransportPlugin snowflake exec server/server --disable-tls --log snowflake-server.log
Run the client:
tor -f torrc.client
# contents of torrc.client:
DataDirectory datadir-client
UseBridges 1
SocksPort 9250
ClientTransportPlugin snowflake exec client/client --url http://127.0.0.1:8000/ --ice stun:stun.l.google.com:19302 --log snowflake-client.log
Bridge snowflake 0.0.3.0:1
Start downloading a big file through the tor SocksPort. You will be able
to see activity in snowflake-client.log and in the output of proxy-go.
curl -x socks5://127.0.0.1:9250/ --location --speed-time 60 https://cdimage.debian.org/mirror/cdimage/archive/10.1.0/amd64/iso-cd/debia… > /dev/null
Now kill proxy-go and restart it. Wait 30 seconds for snowflake-client
to notice the proxy has disappeared. Then snowflake-client.log will say
redialing on same connection
and the download will resume. It's not curl restarting the download on a
new connection—from the perspective of curl (and tor) it's all one long
proxy connection, with a 30-second lull in the middle. Only
snowflake-client knows that there were two WebRTC connections involved.
== Introduction to code changes ==
Start by looking at the server changes:
https://gitweb.torproject.org/user/dcf/snowflake.git/diff/server/server.go?…
The first thing to notice is a kind of "inversion" of control flow.
Formerly, the ServeHTTP function accepted WebSocket connections and
connected each one with the ORPort. There was no virtual session: each
WebSocket connection corresponded to exactly one client session. Now,
the main function, separately from starting the web server, starts a
virtual listener (kcp.ServeConn) that calls into a chain of
acceptSessions→acceptStreams→handleStream functions that ultimately
connects a virtual stream with the ORPort. But this virtual listener
doesn't actually open a network port, so what drives it? That's now the
sole responsibility of the ServeHTTP function. It still accepts
WebSocket connections, but it doesn't connect them directly to the
ORPort—instead, it pulls out discrete packets (encoded into the stream
using length prefixes) and feeds those packets to the virtual listener.
The glue that links the virtual listener and the ServeHTTP function is
QueuePacketConn, an abstract interface that allows the virtual listener
to send and receive packets without knowing exactly how those I/O
operations are implemented. (In this case, they're implemented by
encoding packets into WebSocket streams.)
The new control flow boils down to a simple, traditional listen/accept
loop, except that the listener doesn't interact with the network
directly, but only through the QueuePacketConn interface. The WebSocket
part of the program now only acts as a network interface that performs
I/O functions on behalf of the QueuePacketConn. In effect, we've moved
everything up one layer of abstraction: where formerly we had an HTTP
server using the operating system as a network interface, we now have a
virtual listener using the HTTP server as a network interface (which
in turn ultimately uses the operating system as the *real* network
interface).
Now look at the client changes:
https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=turbotunnel&…
The Handler function formerly grabbed exactly one snowflake proxy
(snowflakes.Pop()) and used its WebRTC connection until it died, at
which point it would close the SOCKS connection and terminate the whole
Tor session. Now, the function creates a RedialPacketConn, an abstract
interface that grabs a snowflake proxy, uses it for as long as it lasts,
then grabs another. Each of the temporary snowflake proxies is wrapped
in an EncapsulationPacketConn to convert it from a stream-oriented
interface to a packet-oriented interface. EncapsulationPacketConn uses
the same length-prefixed protocol that the server expects. We then
create a virtual client connection (kcp.NewConn2), configured to use the
RedialPacketConn as its network interface, and open a new virtual
stream. (This sequence of calls kcp.NewConn2→sess.OpenStream corresponds
to acceptSessions→acceptStreams on the server.) We then connect
(copyLoop) the SOCKS connection and the virtual stream. The virtual
stream never touches the network directly—it interacts indirectly
through RedialPacketConn and EncapsulationPacketConn, which make use of
whatever snowflake proxy WebRTC connection happens to exist at the time.
You'll notice that before anything else, the client sends a 64-bit
ClientID. This is a random number that identifies a particular client
session, made necessary because the virtual session is not tied to an IP
4-tuple or any other network identifier. The ClientID remains the same
across all redials in one call to the Handler function. The server
parses the ClientID out of the beginning of a WebSocket stream. The
ClientID is how the server knows if it should open a new ORPort
connection or append to an existing one, and which temporary WebSocket
connections should receive packets that are addressed to a particular
client.
There's a lot of new support code in the common/encapsulation and
common/turbotunnel directories, mostly reused from my previous work in
integrating Turbo Tunnel into pluggable transports.
https://gitweb.torproject.org/user/dcf/snowflake.git/tree/common/encapsulat…
The encapsulation package provides a way of encoding a sequence of
packets into a stream. It's essentially just prefixing each packet with
its length, but it takes care to permit traffic shaping and padding to
the byte level. (The Snowflake turbotunnel branch doesn't take advantage
of the traffic-shaping and padding features.)
https://gitweb.torproject.org/user/dcf/snowflake.git/tree/common/turbotunne…https://gitweb.torproject.org/user/dcf/snowflake.git/tree/common/turbotunne…
QueuePacketConn and ClientMap are imported pretty much unchanged from
the meek implementation (https://github.com/net4people/bbs/issues/21).
Together these data structures manage queues of packets and allow you to
send and receive them using custom code. In meek it was done over raw
HTTP bodies; here it's done over WebSocket. These two interfaces are
candidates for an eventual reusable Turbo Tunnel library.
https://gitweb.torproject.org/user/dcf/snowflake.git/tree/common/turbotunne…
RedialPacketConn is adapted from clientPacketConn in the obfs4proxy
implementation (https://github.com/net4people/bbs/issues/14#issuecomment-544747519).
It's the part that uses an underlying connection for as long as it
exists, then switches to a new one. Since the obfs4proxy implementation,
I've decided that it's better to have this type use the packet-oriented
net.PacketConn as the underlying type, not the stream-oriented net.Conn.
That way, RedialPacketConn doesn't have to know details of how packet
encapsulation happens, whether by EncapsulationPacketConn or some other
way.
== Backward compatibility ==
The branch as of commit 07495371d67f914d2c828bbd3d7facc455996bd2 is not
backward compatible with the mainline Snowflake code. That's because the
server expects to find a ClientID and length-prefixed packets, and
currently deployed clients don't work that way. However, I think it will
be possible to make the server backward compatible. My plan is to
reserve a distinguished static token (64-bit value) and have the client
send that at the beginning of the stream, before its ClientID, to
indicate that it uses Turbo Tunnel features. The token will be selected
to be distinguishable from any protocol that non–Turbo Tunnel clients
might use (i.e., Tor TLS). Then, the server's ServeHTTP function can
choose one of two implementations, depending on whether it sees the
magic token or not.
If I get backward compatibility working, then we can deploy a dual-mode
bridge that is able to serve either type of client. Then I can try
making a Tor Browser build, to make the Turbo Tunnel code more
accessible for user testing.
One nice thing about all this is that it doesn't require any changes to
proxies. They remain simple dumb pipes, so we don't have to coordinate a
mass proxy upgrade.
https://gitweb.torproject.org/user/dcf/snowflake.git/tree/server/server.go?…
The branch currently lacks client geoip lookup (ExtORPort USERADDR),
because of the difficulty I have talked about before of providing an IP
address for a virtual session that is not inherently tied to any single
network connection or address. I have a plan for solving it, though; it
requires a slight breaking of abstractions. In the server, after reading
the ClientID, we can peek at the first 4 bytes of the first packet.
These 4 bytes are the KCP conversation ID (https://github.com/xtaci/kcp-go/blob/v5.5.5/kcp.go#L120),
a random number chosen by the client, serving roughly the same purpose
in KCP as our ClientID. We store a temporary mapping from the
conversation ID to the IP address of client making the WebSocket
connection. kcp-go provides a GetConv function that we can call in
handleStream, just as we're about to connect to the ORPort, to look up
the client's IP address in the mapping. The possibility of doing this is
one reason I decided to go with KCP for this implementation rather than
QUIC as I did in the meek implementation: the quic-go package doesn't
expose an accessor for the QUIC connection ID.
== Limitations ==
I'm still using the same old logic for detecting a dead proxy, 30
seconds without receiving any data. This is suboptimal for many reasons
(https://bugs.torproject.org/25429), one of which is that when your
proxy dies, you have to wait at least 30 seconds until the connection
becomes useful again. That's why I had to use "--speed-time 60" in the
curl command above; curl has a default idle timeout of 30 seconds, which
would cause it to give up just as a new proxy was becoming available.
I think we can ultimately do a lot better, and make better use of the
available proxy capacity. I'm thinking of "striping" packets across
multiple snowflake proxies simultaneously. This could be done in a
round-robin fashion or in a more sophisticated way (weighted by measured
per-proxy bandwidth, for example). That way, when a proxy dies, any
packets sent to it would be detected as lost (unacknowledged) by the KCP
layer, and retransmitted over a different proxy, much quicker than the
30-second timeout. The way to do this would be to replace
RedialPacketConn—which uses once connection at a time—with a
MultiplexingPacketConn, which manages a set of currently live
connections and uses all of them. I don't think it would require any
changes on the server.
But the situation in the turbotunnel branch is better than the status
quo, even without multiplexing, for two reasons. First, the connection
actually *can* recover after 30 seconds. Second, the smux layer sends
keepalives, which means that you won't discard a proxy merely because
you're temporarily idle, but only when it really stops working.
== Notes ==
https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=turbotunnel&…
I added go.mod and go.sum files to the repo. I did this because smux
(https://github.com/xtaci/smux) has a v2 protocol that is incompatible
with the v1 protocol, and I don't know how to enforce the use of v2 in
the build other than by activating Go modules and specifying a version
in go.mod.
On 2020-01-22, I got an email from Microsoft Azure about a data breach
of customer support records. The summary is that between 2019-12-05 and
2019-12-31, some Azure customer support records were exposed and
downloadable, though they don't think any were actually downloaded. I
got an notification because they identified some of the records as
belonging to the Azure account I administer.
https://msrc-blog.microsoft.com/2020/01/22/access-misconfiguration-for-cust…https://www.zdnet.com/article/microsoft-discloses-security-breach-of-custom…https://www.reddit.com/r/AZURE/comments/esdwld/microsoft_database_containin…
The involved account is the one that used to be used for meek-azure
domain fronting, and is currently used for Snowflake rendezvous domain
fronting (using the Azure CDN). The account is no longer used for
meek-azure.
The email said I could file a support request to find out exactly what
information was exposed, so that's what I did. The data set they sent
back to me consistend of two email threads, neither one directly related
to Tor's use of Azure. One was about trying to delete a an unused VM
disk image, and one was trying to update a credit card.
I didn't find my name nor the account email address in the files.
Apparently the files that were exposed had already been processed by an
automated redactor. I see markers like "{AlphanumericPII}" and
"{Namepii}" in the files, even over-redactions like
"font-family:"Times New {Namepii}"".
Since #28942, Snowflake no longer uses the go-webrtc library. (Actually
server-webrtc still uses it, but server-webrtc itself is hardly used.) I
opened a ticket (at GitHub, where go-webrtc is hosted) to discuss what
to do with it, especially by third parties who may not know that its
maintainers aren't working on it actively any more.
https://github.com/keroserene/go-webrtc/issues/109
Hey, I started a pad for brainstorming ideas for GSoC projects here:
https://pad.riseup.net/p/anti-censorship-gsoc
I've copied over the project from last year. Feel free to edit, add or
reply to this thread.
Cecylia
Let's reconvene on Thursday, January 2nd. It will no longer be 2019, so
I created a new meeting pad for us and populated it with the content of
our old pad:
<https://pad.riseup.net/p/tor-anti-censorship-keep>
The old pad now also links to the new one.
Cheers,
Philipp
This week's meeting falls on Thanksgiving, a U.S. holiday, so several of
us won't be around. Let's cancel our Thursday meeting and meet again
next week, on December 4.
Cheers,
Philipp
Hi!
We need to work on roadmapping January and February, mostly for the
sponsor 28. Can you all meet on December 16th or 17th?
cheers,
gaba
--
Project Manager: Network, Anti-Censorship and Metrics teams
gaba at torproject.org
she/her are my pronouns
GPG Fingerprint EE3F DF5C AD91 643C 21BE 8370 180D B06C 59CA BD19
Other teams have started to use IRC keywords to reach team members,
e.g.: "network-team: somebody review #foo".
We discussed this in today's meeting and agreed to use both "ac-team"
and "anti-censorship-team" for our team:
<http://meetbot.debian.net/tor-meeting/2019/tor-meeting.2019-11-14-18.00.html>
Cheers,
Philipp