[tor-dev] "Seeing through Network-Protocol Obfuscation"
Kevin P Dyer
kpdyer at gmail.com
Sat Aug 22 00:46:39 UTC 2015
Thanks for the interest! I'm one of the authors on the paper. My response
On Wednesday, August 19, 2015, Philipp Winter <phw at nymity.ch> wrote:
> They claim that they are able to detect obfs3, obfs4, FTE, and meek
> using entropy analysis and machine learning.
> I wonder if their dataset allows for such a conclusion. They use a
> (admittedly, large) set of flow traces gathered at a college campus.
> One of the traces is from 2010. The Internet was a different place back
Correct, we used datasets collected in 2010, 2012, and 2014, which total to
>1TB of data and 14M TCP flows.
We could have, say, just used the 2014 dataset. However, we wanted to show
that the choice of dataset matters and even with millions of traces, the
collection date and network-sensor location can impact results.
> I would also expect college traces to be very different from
> country-level traces. For example, the latter should contain
> significantly more file sharing, and other traffic that is considered
> inappropriate in a college setting. Many countries also have popular
> web sites and applications that might be completely missing in their
> data sets.
That's probably accurate. I bet that even across different types of
universities (e.g., technical vs. non-technical) one might see very
different patterns. Certainly different countries (e.g., Iran vs. China)
will see different patterns, too.
For that reason, we're going to release our code  prior to CCS. Liang
Wang, a grad student at University of Wisconsin - Madison, lead a
substantial engineering effort to make this possible. We undersold it in
the paper, but it makes it easy to re-run all these experiments on new
datasets. We'd *love* it if others could rerun the experiments against new
datasets and report their results.
> Considering the rate difference between normal and obfuscated traffic,
> the false positive rate in the analysis is significant. Trained
> classifiers also seem to do badly when classifying traces they weren't
> trained for.
We definitely encountered this. If you train on one dataset and test on a
different one, then accuracy plummeted.
I think that raises a really interesting research question: what does it
mean for two datasets to be different? For this type of classification
problem, what level of granularity/frequency would a network operator train
at to achieve optimal accuracy and low false positives? (e.g., do you need
a classifier per country? state? city? neighborhood?) Also, how often does
one need to retrain? daily? weekly?
I guess all we showed is that datasets collected from sensors at different
network locations (and years apart) are different enough to impact
classifier accuracy. Probably not surprising...
> The authors suggest active probing to reduce false
> positives, but don't mention that this doesn't work against obfs4 and
I don't want to get too off track here, but do obfs4 and meek really resist
against active probing from motivated countries? Don't we still have the
unsolved bridge/key distribution problem?
Finally, we’ll be working on a full version of this paper with additional
results. If anyone is interested in reviewing and providing feedback, we’d
love to hear it. (Philipp - do you mind if I reach out to you directly?)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tor-dev