anti-censorship-team

Download

anti-censorship-team@lists.torproject.org

March 2020

1 participants
2 discussions

Next steps for unclassifiable protocols
by Roger Dingledine 04 Oct '20

04 Oct '20

Hi folks, Here are some brainstorming thoughts to get us started improving our "unclassifiable protocols" idea. ---------------------------------------------------------------------- For context, right now we have essentially one deployed design in this space, which is obfs4: https://gitweb.torproject.org/pluggable-transports/obfs4.git/tree/doc/obfs4… This protocol does a short handshake (one client message, one server message) to establish session keys, then it just sends bytes back and forth with an (additional) layer of encryption so no payload bytes assist an attacker with doing content-based protocol classification. ---------------------------------------------------------------------- One of the more thorough analyses of obfs4 comes from the FTE folks: https://www.freehaven.net/anonbib/#ccs2015obfsdetection where they show a classifier that assesses entropy of the first server-side packet, plus looking at the length of that initial message, and they can consistently pick out obfs4 flows with only a 0.25% false positive rate compared to background traffic they recorded at their university. That false positive rate is where it gets interesting: it sounds good until we realize that it means it flags 1 in every 400 flows as being obfs4, i.e. it falsely tags about 36000 flows in the 14M flow dataset. And the false positives aren't randomly distributed either: 1.22% of their SSL flows are falsely tagged as obfs4, and it's the ones where the SNI in the SSL handshake has a lot of entropy (because everything else in the SSL handshake is already random). Also, entire protocols like RTMP get caught up and tagged as obfs4, because their handshake starts out with a block of random bytes. A key takeaway is that if they ran this classifier, with its high true positive rate and low false positive rate, on a large country's network traffic, approximately none of the flows that it would tag as obfs4 would actually be obfs4. Their Section 7 in that paper has a good discussion of the base rate issues and how they apply here. Are there other papers that design or analyze classifiers for obfs4? ---------------------------------------------------------------------- The good news: It would seem that obfs4 is most effective in the "long tail of varied traffic" scenario. That is, considering the spectrum between "small corporate network" and "China backbone", obfs4 needs that more broad background traffic in order to make its false positives too painful to block. The bad news: I still worry about an attack that puts together many building blocks, each of which individually is like the "1% false positive rate" classifier in the above paper, but that together drive their false positive rate low enough that blocking is safe to do. One observation there is that the more complexity there is to a protocol, the harder it is to "really" look like it, i.e. to match it in all dimensions at once. Consider this lovingly handcrafted graph, where the X axis is how thoroughly we try to look like some expected protocol, and the Y axis is how close we can get to making the adversary unwilling to block us: 1_| |_1 | | |\ /| | \ / | | \ / | | \ / | 0_| \___________________________________________/ |_0 | | There's a huge valley in the middle, where we're trying to look like something, but we're not doing it well at all, so there is little damage from blocking us. The ramp on the right is the game that FTE tries to play, where in theory if they're not perfectly implementing their target protocol then the adversary can deploy a distinguisher and separate their traffic from the "real" protocol, but in practice there are enough broken and weird implementations of the protocol in the wild that even being "close enough" might cause the censor to hesitate. And the ramp on the left is our unclassifiable traffic game, where we avoid looking like any particular expected protocol, and instead rely on the long tail of legitimate traffic to include something that we blend with. Observation: *both* of these ramps are in the "broken in theory, but works in practice" situation. I started out thinking it was just the obfs4 one, and that the FTE one was somehow better grounded in theory. But neither side is actually trying to build the ideal perfect protocol, whatever that even is. The game in both cases is about false positives, which come from messy (i.e. hard to predict) real-world background traffic. One research question I wrestled with while making the graph: which ramp is steeper? That is, which of these approaches is more forgiving? Does one of them have a narrower margin for error than the other, where you really have to be right up against the asymptote or it doesn't work so well? For both approaches, their success depends on the variety of expected background traffic. The steepness of the right-hand (look-like-something) ramp also varies greatly based on the specific protocol it's trying to look like. At first glance we might think that the more complex the protocol, the better you're going to need to be at imitating it in all dimensions. That is, the more aspects of the protocol you need to get right, the more likely you'll slip up on one of them. But competing in the other direction is: the more complex the protocol, the more broken weird implementations there could be in practice. I raise the protocol complexity question here because I think it has something subtle and critical to do with the look-like-nothing strategy. Each dimension of the protocol represents another opportunity to deploy a classifier building block, where each classifier by itself is too risky to rely on, but the composition of these blocks produces the killer distinguisher. One of the features of the unclassifiable protocol that we need to maintain, as we explore variations of it, is the simplicity: it needs to be the case that the adversary can't string together enough building-block classifiers to reach high confidence. We need to force them into building classifiers for *every other protocol*, rather than letting them build a classifier for our protocol. (I'll also notice that I'm mushing together the notion of protocol complexity with other notions like implementation popularity and diversity: a complex proprietary protocol with only one implementation will be no fun to imitate, but the same level of complexity where every vendor implements their own version will be much more workable.) ---------------------------------------------------------------------- I've heard two main proposed ways in which we could improve obfs in theory -- and hopefully thus in practice too: (A) Aaron's idea of using the latest adversarial machine learning approaches to evolve a traffic transform that resists classifying. That is, play off the classifiers with our transform, in many different background traffic scenarios, such that we end up with a transform that resists classifying (low true positive and/or high false positive) in many of the scenarios. (B) Philipp's idea from scramblesuit of having the transform be parameterized, and for each bridge we choose and stick with a given set of parameters. That way we're not generating *flows* that each aim to blend in, but rather we're generating bridges that each aim to blend differently. This diversity should adapt well to many different background traffic scenarios because in every scenario some bridges might be identified but some bridges will stay under the radar. At first glance these two approaches look orthogonal, i.e. we can do both of them at once. For example, Aaron's approach tells us the universe of acceptable parameters, and Philipp's approach gives us diversity within that universe. Aaron: How do we choose the parameter space for our transforms to explore? How much of that can be automated, and how much needs to be handcrafted by subject-matter-experts? I see how deep learning can produce a magic black-box classifier, but I don't yet see how that approach can present us with a magic black-box transform. And as a last note, it would sure be nice to produce transforms that are robust relative to background traffic, i.e. to not be brittle or overfit to a particular scenario. Otherwise we're giving up one of our few advantages in the arms race, which is that right now we force the censor to characterize the traffic -- including expected future traffic! -- and assess whether it's safe to block us. There. Hopefully some of these ideas will cause you to have better ideas. :) --Roger

2 2

Turbo Tunnel in Snowflake
by David Fifield 19 Mar '20

19 Mar '20

https://gitweb.torproject.org/user/dcf/snowflake.git/log/?h=turbotunnel&id=… These are the elements of a Turbo Tunnel implementation for Snowflake. Turbo Tunnel is a name for overlaying an abstract, virtual session on top of concrete, physical network connections, such that the virtual session is not tied to any particular network connection. In Snowflake, it solves the problem of migrating a session across multiple WebRTC connections as temporary proxies come and go. This post is a walkthrough of the code changes and my design decisions. == How to try it == Download the branch and build it: git remote add dcf https://git.torproject.org/user/dcf/snowflake.git git checkout -b turbotunnel --track dcf/turbotunnel for d in client server broker proxy-go; do (cd $d && go build); done Run the broker (not changed in this branch): broker/broker --disable-tls --addr 127.0.0.1:8000 Run a proxy (not changed in this branch): proxy-go/proxy-go --broker http://127.0.0.1:8000/ --relay ws://127.0.0.1:8080/ Run the server: tor -f torrc.server # contents of torrc.server: DataDirectory datadir-server SocksPort 0 ORPort 9001 ExtORPort auto BridgeRelay 1 AssumeReachable 1 PublishServerDescriptor 0 ServerTransportListenAddr snowflake 0.0.0.0:8080 ServerTransportPlugin snowflake exec server/server --disable-tls --log snowflake-server.log Run the client: tor -f torrc.client # contents of torrc.client: DataDirectory datadir-client UseBridges 1 SocksPort 9250 ClientTransportPlugin snowflake exec client/client --url http://127.0.0.1:8000/ --ice stun:stun.l.google.com:19302 --log snowflake-client.log Bridge snowflake 0.0.3.0:1 Start downloading a big file through the tor SocksPort. You will be able to see activity in snowflake-client.log and in the output of proxy-go. curl -x socks5://127.0.0.1:9250/ --location --speed-time 60 https://cdimage.debian.org/mirror/cdimage/archive/10.1.0/amd64/iso-cd/debia… > /dev/null Now kill proxy-go and restart it. Wait 30 seconds for snowflake-client to notice the proxy has disappeared. Then snowflake-client.log will say redialing on same connection and the download will resume. It's not curl restarting the download on a new connection—from the perspective of curl (and tor) it's all one long proxy connection, with a 30-second lull in the middle. Only snowflake-client knows that there were two WebRTC connections involved. == Introduction to code changes == Start by looking at the server changes: https://gitweb.torproject.org/user/dcf/snowflake.git/diff/server/server.go?… The first thing to notice is a kind of "inversion" of control flow. Formerly, the ServeHTTP function accepted WebSocket connections and connected each one with the ORPort. There was no virtual session: each WebSocket connection corresponded to exactly one client session. Now, the main function, separately from starting the web server, starts a virtual listener (kcp.ServeConn) that calls into a chain of acceptSessions→acceptStreams→handleStream functions that ultimately connects a virtual stream with the ORPort. But this virtual listener doesn't actually open a network port, so what drives it? That's now the sole responsibility of the ServeHTTP function. It still accepts WebSocket connections, but it doesn't connect them directly to the ORPort—instead, it pulls out discrete packets (encoded into the stream using length prefixes) and feeds those packets to the virtual listener. The glue that links the virtual listener and the ServeHTTP function is QueuePacketConn, an abstract interface that allows the virtual listener to send and receive packets without knowing exactly how those I/O operations are implemented. (In this case, they're implemented by encoding packets into WebSocket streams.) The new control flow boils down to a simple, traditional listen/accept loop, except that the listener doesn't interact with the network directly, but only through the QueuePacketConn interface. The WebSocket part of the program now only acts as a network interface that performs I/O functions on behalf of the QueuePacketConn. In effect, we've moved everything up one layer of abstraction: where formerly we had an HTTP server using the operating system as a network interface, we now have a virtual listener using the HTTP server as a network interface (which in turn ultimately uses the operating system as the *real* network interface). Now look at the client changes: https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=turbotunnel&… The Handler function formerly grabbed exactly one snowflake proxy (snowflakes.Pop()) and used its WebRTC connection until it died, at which point it would close the SOCKS connection and terminate the whole Tor session. Now, the function creates a RedialPacketConn, an abstract interface that grabs a snowflake proxy, uses it for as long as it lasts, then grabs another. Each of the temporary snowflake proxies is wrapped in an EncapsulationPacketConn to convert it from a stream-oriented interface to a packet-oriented interface. EncapsulationPacketConn uses the same length-prefixed protocol that the server expects. We then create a virtual client connection (kcp.NewConn2), configured to use the RedialPacketConn as its network interface, and open a new virtual stream. (This sequence of calls kcp.NewConn2→sess.OpenStream corresponds to acceptSessions→acceptStreams on the server.) We then connect (copyLoop) the SOCKS connection and the virtual stream. The virtual stream never touches the network directly—it interacts indirectly through RedialPacketConn and EncapsulationPacketConn, which make use of whatever snowflake proxy WebRTC connection happens to exist at the time. You'll notice that before anything else, the client sends a 64-bit ClientID. This is a random number that identifies a particular client session, made necessary because the virtual session is not tied to an IP 4-tuple or any other network identifier. The ClientID remains the same across all redials in one call to the Handler function. The server parses the ClientID out of the beginning of a WebSocket stream. The ClientID is how the server knows if it should open a new ORPort connection or append to an existing one, and which temporary WebSocket connections should receive packets that are addressed to a particular client. There's a lot of new support code in the common/encapsulation and common/turbotunnel directories, mostly reused from my previous work in integrating Turbo Tunnel into pluggable transports. https://gitweb.torproject.org/user/dcf/snowflake.git/tree/common/encapsulat… The encapsulation package provides a way of encoding a sequence of packets into a stream. It's essentially just prefixing each packet with its length, but it takes care to permit traffic shaping and padding to the byte level. (The Snowflake turbotunnel branch doesn't take advantage of the traffic-shaping and padding features.) https://gitweb.torproject.org/user/dcf/snowflake.git/tree/common/turbotunne… https://gitweb.torproject.org/user/dcf/snowflake.git/tree/common/turbotunne… QueuePacketConn and ClientMap are imported pretty much unchanged from the meek implementation (https://github.com/net4people/bbs/issues/21) Together these data structures manage queues of packets and allow you to send and receive them using custom code. In meek it was done over raw HTTP bodies; here it's done over WebSocket. These two interfaces are candidates for an eventual reusable Turbo Tunnel library. https://gitweb.torproject.org/user/dcf/snowflake.git/tree/common/turbotunne… RedialPacketConn is adapted from clientPacketConn in the obfs4proxy implementation (https://github.com/net4people/bbs/issues/14#issuecomment-544747519) It's the part that uses an underlying connection for as long as it exists, then switches to a new one. Since the obfs4proxy implementation, I've decided that it's better to have this type use the packet-oriented net.PacketConn as the underlying type, not the stream-oriented net.Conn. That way, RedialPacketConn doesn't have to know details of how packet encapsulation happens, whether by EncapsulationPacketConn or some other way. == Backward compatibility == The branch as of commit 07495371d67f914d2c828bbd3d7facc455996bd2 is not backward compatible with the mainline Snowflake code. That's because the server expects to find a ClientID and length-prefixed packets, and currently deployed clients don't work that way. However, I think it will be possible to make the server backward compatible. My plan is to reserve a distinguished static token (64-bit value) and have the client send that at the beginning of the stream, before its ClientID, to indicate that it uses Turbo Tunnel features. The token will be selected to be distinguishable from any protocol that non–Turbo Tunnel clients might use (i.e., Tor TLS). Then, the server's ServeHTTP function can choose one of two implementations, depending on whether it sees the magic token or not. If I get backward compatibility working, then we can deploy a dual-mode bridge that is able to serve either type of client. Then I can try making a Tor Browser build, to make the Turbo Tunnel code more accessible for user testing. One nice thing about all this is that it doesn't require any changes to proxies. They remain simple dumb pipes, so we don't have to coordinate a mass proxy upgrade. https://gitweb.torproject.org/user/dcf/snowflake.git/tree/server/server.go?… The branch currently lacks client geoip lookup (ExtORPort USERADDR), because of the difficulty I have talked about before of providing an IP address for a virtual session that is not inherently tied to any single network connection or address. I have a plan for solving it, though; it requires a slight breaking of abstractions. In the server, after reading the ClientID, we can peek at the first 4 bytes of the first packet. These 4 bytes are the KCP conversation ID (https://github.com/xtaci/kcp-go/blob/v5.5.5/kcp.go#L120) a random number chosen by the client, serving roughly the same purpose in KCP as our ClientID. We store a temporary mapping from the conversation ID to the IP address of client making the WebSocket connection. kcp-go provides a GetConv function that we can call in handleStream, just as we're about to connect to the ORPort, to look up the client's IP address in the mapping. The possibility of doing this is one reason I decided to go with KCP for this implementation rather than QUIC as I did in the meek implementation: the quic-go package doesn't expose an accessor for the QUIC connection ID. == Limitations == I'm still using the same old logic for detecting a dead proxy, 30 seconds without receiving any data. This is suboptimal for many reasons (https://bugs.torproject.org/25429) one of which is that when your proxy dies, you have to wait at least 30 seconds until the connection becomes useful again. That's why I had to use "--speed-time 60" in the curl command above; curl has a default idle timeout of 30 seconds, which would cause it to give up just as a new proxy was becoming available. I think we can ultimately do a lot better, and make better use of the available proxy capacity. I'm thinking of "striping" packets across multiple snowflake proxies simultaneously. This could be done in a round-robin fashion or in a more sophisticated way (weighted by measured per-proxy bandwidth, for example). That way, when a proxy dies, any packets sent to it would be detected as lost (unacknowledged) by the KCP layer, and retransmitted over a different proxy, much quicker than the 30-second timeout. The way to do this would be to replace RedialPacketConn—which uses once connection at a time—with a MultiplexingPacketConn, which manages a set of currently live connections and uses all of them. I don't think it would require any changes on the server. But the situation in the turbotunnel branch is better than the status quo, even without multiplexing, for two reasons. First, the connection actually *can* recover after 30 seconds. Second, the smux layer sends keepalives, which means that you won't discard a proxy merely because you're temporarily idle, but only when it really stops working. == Notes == https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=turbotunnel&… I added go.mod and go.sum files to the repo. I did this because smux (https://github.com/xtaci/smux) has a v2 protocol that is incompatible with the v1 protocol, and I don't know how to enforce the use of v2 in the build other than by activating Go modules and specifying a version in go.mod.

4 15