Re: [anti-censorship-team] Next steps for unclassifiable protocols

4 Oct 2020

      On Fri, May 31, 2019 at 01:02:35AM -0400, Roger Dingledine wrote:
...
Here are some brainstorming thoughts to get us started improving our
"unclassifiable protocols" idea.
For posterity, here are the notes I wrote for my 5-minute talk last week
during the Tor All Hands, to summarize the area and this post:

-------

Today I want to talk about pluggable transports, and why they work in
practice.

Big picture as a reminder: pluggable transports are ways to transform
Tor traffic so it's harder to block or recognize.

Our current most popular transport is obfs4, which just adds a layer of
encryption so no recognizable headers.

Works in China; also used by other projects (like VPN providers and other
circumvention tools).

-------

But, *why* does it work? In theory it doesn't look like any traffic
they expect: not web browsing, not messaging, not any expected protocol.

Our guess for why it works is because there's a long tail of China backbone
traffic that it blends with, i.e. if their protocol classifier says "no
idea", blocking all unclassified traffic is too much collateral damage.

For example, if their classifier has a 1% false positive rate, that
means it would block 1 in every 100 flows -- so almost everything that
it blocks would be a mistake.
This is what people mean when they say "base rate fallacy".

We're turning the game around: now *they* have to decide what is the cost
of blocking us.

Maybe they *could* block it, but they don't, because they're doing fine
blocking our bridges by IP address, i.e. beating bridgedb?
  But is that true for Lantern and others too?
  And, what if we make bridgedb stronger?

Quote from Raytheon DPI dev: "This looks like the sort of protocol that
the general's engineers deployed ten years ago after the last coup,
and nobody has looked at since. If I block it, some building is going
to disappear from the internet. No way am I going to touch this."

------

That is, sure on any small or medium sized network it will look funny, but
on a big enough network, there are other things that look like it.
E.g. RTMP starts with a randomized handshake too. Most but not all of the
SSL handshake is random-looking.

But if you look at enough packets, or you look at the timing and volume,
surely you can tell "obfuscated Tor" apart from other things?

So we're in this weird situation where it's broken in theory but it
works in practice.

------

Compare to FTE/Marionette, which tries to look like http.

I used to think these were two different approaches to the problem:
look-like-nothing vs look-like-something.

You have to *really* look like nothing, or *really* look like your target
thing, and in between is this dead zone where the censor has it easy.

But, Marionette probably works in practice because it blends with the
long tail of weird and buggy http implementations.

(They're never going to be fully bug compatible with Firefox and Apache.)

#1 So it's really the same game! They need a long tail too, and for them
it's about false positives too.

-----

#2 Ok, how do you evaluate these things?

Get a testbed, install some background apps, compare? But it will never
be enough background traffic to be "realistic"!

Get a real network, add some obfs4 to it, compare? We were on track for
doing that with a university CS dept but then covid cleared out their
building. Still sounds useful, because if it works or it doesn't work,
either way it will tell us something about how big is big enough.

-----

#3 And then the third question is: can we come up with a theory for how
to reason about the security here? Maybe that can lead to new designs
that are more robust to blocking in practice too?
Generative Adversarial Networks (GANs).