[tor-dev] TOP SECRET BULLETIN ABOUT THE RACCOON EFFECT

The23rd Raccoon raccoon23 at protonmail.com
Thu Dec 24 20:12:25 UTC 2020


On Wednesday, December 23, 2020 4:15 AM, Mike Perry <mikeperry at torproject.org> wrote:
> On 12/23/20 7:58 PM, The23rd Raccoon wrote:
> > Indeed, once time is included as a feature, deep learning based Website
> > Traffic Fingerprinting attacks will effectively be correlating the
> > timing and traffic patterns of websites to their representations in its
> > neural model. This model comparison is extremely similar to how
> > end-to-end correlation compares the timing and traffic patterns of Tor
> > entrance traffic to Tor exit traffic. In fact, deep learning classifiers
> > have already shown success in correlating end-to-end traffic on Tor[28].
>
> While you have offered no specific testable predictions for this theory,
> presumably to score more crackpot points, allow me to provide a
> reduction proof sketch, as well as an easily testable result.
>
> To see that Deep Fingerprinting reduces to Deep Correlation, consider
> the construction where the correlator function from DeepCorr is used to
> correlate pairs of raw test traces to the raw training traces that were
> used to train the Deep Fingerprinting classifier. The correlated pairs
> would be constructed from the monitored set's test and training
> examples. This means that instead of correlating client traffic to Exit
> traffic, DeepCorr is correlating "live" client traces directly to the
> raw fingerprinting training model, as you said.
>
> This gets us "closed world" fingerprinting results. For "open world"
> results, include the unmonitored set as input that does not contain
> matches (to represent partial network observation that results in
> unmatched pairs).

Thank you for this clarification! This is exactly what I was talking
about, in between scoring crackpot points.

> If the accuracy from this DeepCorr Fingerprinting construction is better
> than Deep Fingerprinting for closed and open world scenarios, one can
> conclude that Deep Fingerprinting reduces to DeepCorr, in a
> computational complexity and information-theoretic sense. This is
> testable.
>
> If the accuracy is worse, then Deep Fingerprinting is actually a more
> powerful attack than DeepCorr, and thus defenses against Deep
> Fingerprinting should perform even better against DeepCorr, for web
> traffic. This is also testable.
>
> This reduction also makes sense intuitively. The most powerful
> correlation and fingerprinting attacks now use CNNs under the hood. So
> they should both have the same expressive power, and inference
> capability.
>
> Interestingly, the dataset that Pulls used was significantly larger than
> what DeepCorr used, in terms of "pairs" that must be matched.

I am very suspicious that DeepCorr found in figure 7 that the false
positive rate did not change with additional flows. This makes me
suspect that this figure is reporting raw per-flow P(C|M) and
P(C|~M), from my first post:
https://archives.seul.org/or/dev/Mar-2012/msg00019.html

Again, Danezis argued against my math saying that modern correlators
perform correlation on all n^2 streams "as a whole", rather than
pairwise:
https://conspicuouschatter.wordpress.com/2008/09/30/the-base-rate-fallacy-and-the-traffic-analysis-of-tor/

However, based on the fact that Website Traffic Fingerprinting works,
as you add more concurrent flows for popular websites, shouldn't the
correlation find false positives among them? And then, what about
defenses that make different websites correlate in this way?

Additionally, it is somewhat amusing to me that DeepCorr used almost
the exact same scale of experimental flows as my 2008 post (~5000),
and reports a false positive rate of the same magnitudes.
(0.001 FP; 0.999 TP, from eyeballing Figure 8):
https://people.cs.umass.edu/~amir/papers/CCS18-DeepCorr.pdf

More science is needed! The construction above is a very good start!

> More interestingly, DeepCorr also found that truncating flows to the
> initial portion was still sufficient for high accuracy. Pulls's
> defenses also found that the beginning of website traces were most
> important to pad heavily.

I actually agree with the dogma that "more packets means more
information", and that DeepCorr should improve with longer flows.

However, research does indicate that the highest differentiating
information gain is present in the initial portion of web traffic.
Additionally, Pulls's alien AI confirmed this independently.

There is also a limit to how long website flows tend to be. The
application layer can also get involved to enforce an arbitrary
limit, before reconnecting via other paths, as Panchenko showed.

> > P.P.P.S. At 1004 points on the crackpot index, I believe this post is
> > now the highest scoring publication with a valid novel idea that has
> > been written, to date[2].
>
> If it helps to get a raccoon into the world record books: I again
> confirm this is a valid, novel idea. I have kept John Baez on Cc for
> this reason. We should probably take him off after this :).

The suppression of my ideas remains extreme! My post never made it back
to me, nor did it end up in my spam folder! Very suspicious!

In case anyone missed it, it did hit the list archives:
https://lists.torproject.org/pipermail/tor-dev/2020-December/014496.html


More information about the tor-dev mailing list