[tor-dev] "Seeing through Network-Protocol Obfuscation"

Wed Aug 19 18:13:03 UTC 2015

<https://kpdyer.com/publications/ccs2015-measurement.pdf>

They claim that they are able to detect obfs3, obfs4, FTE, and meek
using entropy analysis and machine learning.

I wonder if their dataset allows for such a conclusion.  They use a
(admittedly, large) set of flow traces gathered at a college campus.
One of the traces is from 2010.  The Internet was a different place back
then.  I would also expect college traces to be very different from
country-level traces.  For example, the latter should contain
significantly more file sharing, and other traffic that is considered
inappropriate in a college setting.  Many countries also have popular
web sites and applications that might be completely missing in their
data sets.

Considering the rate difference between normal and obfuscated traffic,
the false positive rate in the analysis is significant.  Trained
classifiers also seem to do badly when classifying traces they weren't
trained for.  The authors suggest active probing to reduce false
positives, but don't mention that this doesn't work against obfs4 and
meek.

Cheers,
Philipp