Marc Juarez:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Lunar:
Have you read Mike Perry's long blog post on the topic? https://blog.torproject.org/blog/critique-website-traffic-fingerprinting-att...
It outlines future research work in evaluating the efficiency of fingerprinting attacks, and also mention a couple of promising defenses.
Yes, I am aware of it and I'm currently working on a study to evaluate the efficiency of these attacks.
As Mike Perry said in the post, most of the attacks give an unrealistic advantage to the adversary and probably countermeasures work much better than what has been shown so far.
However, some of the results of these articles suggest that there exist coarse-grained traffic features that are invariant to randomized pipelines (RP, SPDY) and thus can still identify web pages (Dyer et. al.). Also, edit-distance based classifiers broke some old versions of the RP implemented in Tor Browser.
It's an open problem to see if these features actually uniquely identify web pages in larger worlds than the ones considered in the literature. In any case, link-padding strategies are specially designed to conceal these features with the minimal amount of cover traffic and are becoming affordable in terms of bandwidth.
The project I propose would be directed to address this bug ticket:
https://trac.torproject.org/projects/tor/ticket/7028
For example, I would like to implement the common building blocks for link-padding countermeasures (such as a "traffic generator controller" in the onion proxy and the entry guard).
This sounds like a good summer-sized amount of work. I think I am in agreement with George that pluggable transports are a good place to start for prototyping this work. That way, you can experiment with custom padding protocols easily, without needing to make invasive changes to tor-core for each revision, each time.
For example, it would be neat to be able to transmit a set of statistics to your bridge node during the connection handshake or with the circuit setup, so that you don't have to always request downstream padding cells with a upstream cell, and downstream padding can asynchronously arrive according to some probability or histrogram distribution you specify.
You could also obviously specify a number of cells to send in response to a padding cell request (from O..N, where N is some reasonable cap similar to a largeish web object size). The current Tor link padding protocol supports neither of these operations.
More advanced padding protocols are also possible, but may also be overkill. We can discuss those further if this sounds interesting. I'd also like to hear any ideas you might have on the design and/or implementation of such a protocol.
Related: Do you happen to have any existing classifier code working already, by any chance?
One of the ideas I've been considering is taking a closer look at the nearest-neighbor edit distances between page class labels, for the edit distance based classifiers. This distance provides us with an estimate of the ideal minimum cover traffic we will need to make testing instances jump from one nearest-neighbor label to another (causing a false positive). It will also decrease as the world size increases (more class labels in the same amount of N-dimensional space).
A successful defense should change of the distribution of edit distances of test instances around their class labels (it will increase the intra-class variance) and this in turn will increase the size of the threshold around class labels for a given accuracy rate, reducing accuracy or increasing false positives.
It may also be the case that low or no cost defenses (like a smarter use of SPDY) do this, too, but we'll be able to see it for sure with padding.
Does this make sense?