[tor-dev] Blocking-Test: GSoC Penultimate Status Report
brandon at blanu.net
Fri Aug 12 20:50:28 UTC 2011
Scoring and reporting for detectors and encoders is pretty much done. The
graphs aren't quite ready yet, for aesthetic reasons, but I have some nice
tables to report:
Length detector: 16% accuracy
Timing detector: 89% accuracy
Entropy detector: 94% accuracy
SSL was correctly identified 25% of the time, 70% of the time other
protocols were misidentified as being SSL, and 5% of the time SSL was
misidentified as another protocol.
obfsproxy was correctly identified 55% of the time, 10% of the time other
protocols were misidentified as being obfsproxy, and 35% of the time
obfsproxy was misidentified as another protocol.
Dust was correctly identified 48% of the time, 4% of the time other
protocols were misidentified as being Dust, and 48% of the time Dust was
misidentified as another protocol.
obfsproxy is distinguishable from SSL 96% of the time.
obfsproxy is distinguishable from Dust 98% of the time.
Dust is distinguishable from SSL 56% of the time.
I'm sure these numbers are confusing as several different sorts of things
are being measured, all with similar numbers. It is probably quite unclear
how all of these percentages add up. I think this is where the graphs will
help. In particular I think it's hard to visualize accuracy when there are
both false positives and false negatives to take into account.
I think the most interesting result here is the entropy detector. It's the
simplest and also the most effective. Not only that, but the entropy
detector only looks at the entropy of the first packet, so it's inexpensive.
The original reason that I implemented this detector was that there was an
argument against Dust that it could be trivially filtered by setting an
entropy threshold above which all traffic is filtered. Surprisingly, Dust
has lower entropy than SSL, so this attack will not work to specifically
target Dust. This is unexpected because Dust uses totally random bytes,
whereas the SSL handshake does not. This is a result of a general issue with
how Shannon entropy is defined. Shannon entropy is normally defined across a
large statistical sampling, in which case all three protocols tested will
converge to maximum since they are all encrypted and therefore apparently
random. However, the attacker does not in this case get a large statistical
sample. Since they don't know a priori which connections are Dust
connections, they only get however many packets are in each trace to use as
samples. The current version of Dust that I'm testing has most of the
advanced obfuscation stripped out as I haven't had time to add it back in
after completely reimplementing the protocol to use TCP instead of UDP and
all of the modified assumptions that come with that change. So this version
of Dust is basically just the ntor protocol with no packet length or timing
hiding in use. The first packet is therefore 32 random bytes consisting of
the curve25519 ephemeral ECDH key, as compared to the 1k first packet of
SSL. This is the reason for the low entropy calculation, because given only
32 bytes to sample almost all possible bytes are going to have a probability
of 0 or 1. There is an upper bound on entropy when sample sizes are small.
Achieving maximum first-order entropy here would require at least 512 bytes
so that each value 0 to 255 could occur exactly twice.
There are of course many ways to measure entropy and you could say that I'm
just doing it wrong. Some alternative measurements have already been
suggested that I might implement as well. I am not suggesting that Dust is
in any way immune to entropy-based detection. In fact, Dust is currently
detected very well by the current entropy detector (it passed 100% of the
packet length and timing tests, so all detection of Dust was done by the
entropy detector) precisely because it has strangely low entropy because of
it's small first packets. I bring this up only as an interesting example of
how intuition can fail when trying to use Shannon entropy to measure
individual messages instead of channels.
As far as the state of the project goes, I think the project can be
considered done. Looking at the project
everything didn't work out exactly as originally planned, I think it all
worked out for the best and the project is where I wanted it to be by the
completion of Summer of Code. I have generation of HTTP and HTTPS traffic,
both over Tor and plain. I have two encoders: obfs2 and Dust. I have three
detectors: length, timing, and entropy. The string detector didn't work out,
but the entropy detector worked out better than planned, so on balance I
think it was a win. I didn't implement all of the detector modes because
upon further reflection it doesn't really make sense to just try all of
these different modes. You want the mode that works best. So for packet
lengths and timing that is full conversations and for entropy that's the
first packet (could be first N packets, but I tried N=1 and it worked
great). Random sampling of packets just seems like a way to limit your
success at this point when non-random sampling of just the first packet
seems to be so effective. All of the utilities have been written, and more.
I have the specified utilities for separating the pcap files into streams,
and extracting the strings, lengths, and timings. Additionally, I have some
utilities for extracting the entropy of streams, tagging the streams with
what protocol they are using (by port), and extracting dictionary and corpus
models for latent semantic analysis. I also have some utilities for graphing
distributions of properties of different protocols which I hope to finish up
next week during the wrap-up period.
Looking to the future, I have changed my goals somewhat from what I had in
mind at the beginning of the project. After having attended the TorDev
meetup in Waterloo and the PETS conference, I am much more interested in
integrating my work with the Tor project instead of just testing the various
protocols that have been proposed in order to see how my own protocol stacks
up against the competition. As part of this transition, I have rewritten
Dust from scratch, both in terms of implementation and in terms of the
protocol, in order to be better suited for use as a pluggable transport for
Tor. My goal for Dust is to have it shipped with both Tor clients and
bridges so that it can be used in the field to contact bridges despite DPI
filtering of Tor connections. My goal now for the blocking-test project is
to focus on blocking resistance for Tor. So instead of adding all of the
protocols under the sun, I'm going to concentrate on protocols that have
been implemented or are under consideration to become pluggable transports.
Additionally, the next protocol I'm going to add after GSoC is over is just
plain Tor. There is at least one bug filed about a suspicion that Tor can
currently be fingerprinted based on some packet characteristics and I'd like
to be able to close that bug because we now have the capability to easily
generate the necessary information.
Finally, I would like to offer some insight into what I think is the future
of blocking-resistant transports now that I have done some work trying to
break them. Fundamentally, the only thing an attacker can do is limit the
bitrate of the connection. Given any sort of channel, we can encode
arbitrary information over that channel. However, the smaller the channel
the slower the bitrate. The goal of encodings should be to maximize the
bitrate given the constraints of the censor. It's easy to develop very slow
encodings such as natural language encodings over HTTP. However, in order to
maximize the throughput you need to have a good definition of the attack
model. All encoding for blocking-resistant purposes lowers the efficiency
above just transmitting the normal content (assumed it has already been
compressed). Therefore all encoded conversations are going to be longer than
unencoded conversations. I think this is the future of writing detectors
because it's a fundamental constraint on encoding. I think this should even
be able to defeat something like Telex (although let's not get into the
details of attacks on Telex specifically, I'm talking about a general idea
here). Given a (static, for the sake of argument simplicity) website, I can
download every page on that website. Then I can compare the length of
intercepted downloads and detect when they don't match up. For dynamic
websites you can do the same thing with a statistical model. I think,
therefore, that the future of blocking-resisting protocols is to encode
single logical connections over an ensemble of multiple actual connections.
Ideally these connections would mimic normal traffic in terms of number of
connections, duration, directional flow rates, etc.. It's something to think
about anyway. It could be an interesting problem.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tor-dev