Re: [tor-dev] GoSC - Website Fingerprinting project

19 Mar 2014

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi everyone,

Thanks for the answers. Some inline comments below.

On 03/12/2014 09:25 PM, Roger Dingledine wrote:
...
It sounds like the implementation might be the easy part, compared to the
design part. And since GSoC is mostly about implementation, and we would
want to be confident you won't spend all summer distracted by design,
it would be best if your proposal includes a lot of the design.
Yes, the project has an important research component that is still
ongoing. I guess this proposal is not well suited for a GSoC project.
Still, I am very interested in getting involved in the Tor dev community
and specially in the implementation of website fingerprinting
countermeasures. So, I would like to volunteer for this while I continue
with the research.

On 03/12/2014 09:25 PM, Roger Dingledine wrote:
...
It seems like some of the approaches would best be done inside Tor (as
modifications to the Tor program), and some of them would best be done
in a separate pluggable transport? Or should they all be done in a PT?
Can the bandwidth shaping in Scramblesuit (obfsproxy) be used as a
building block here?
On Wed 12 Mar 2014 11:14:06 PM CET, George Kadianakis wrote:
...
If we ever wanted to deploy these anti-traffic-analysis PTs to the
whole network, we would have to add PT support to all clients (HSes
might also benefit from this) and to all relays.
On Tue, Mar 18, 2014 at 7:30 PM, Mike Perry <mikeperry@torproject.org
...
This sounds like a good summer-sized amount of work. I think I am in
agreement with George that pluggable transports are a good place to
start for prototyping this work. That way, you can experiment with
custom padding protocols easily, without needing to make invasive
changes to tor-core for each revision, each time.
Yes, PTs look like a good starting point. However, I think that for a
final countermeasure it is required to pad until the middle-node (to
frustrate and adversary who controls the guard) and, as George said, I
guess deeper modifications will be needed in this case.

On Tue, Mar 18, 2014 at 7:30 PM, Mike Perry <mikeperry@torproject.org
...
For example, it would be neat to be able to transmit a set of statistics
to your bridge node during the connection handshake or with the circuit
setup, so that you don't have to always request downstream padding cells
with a upstream cell, and downstream padding can asynchronously arrive
according to some probability or histrogram distribution you specify.
You could also obviously specify a number of cells to send in response
to a padding cell request (from O..N, where N is some reasonable cap
similar to a largeish web object size). The current Tor link padding
protocol supports neither of these operations.
Definitely. These are the type of building blocks I was referring to.
Any padding-based countermeasure would benefit of the modifications you
proposed. Something that I think it will be needed for any smart defense
that tries to minimize the overhead is some regularly updated database
of webpage traces. Since websites change constantly and we should assume
that the adversary will train on the most recent data, the defense has
to act accordingly.

On Tue, Mar 18, 2014 at 7:30 PM, Mike Perry <mikeperry@torproject.org
...
Related: Do you happen to have any existing classifier code working
already, by any chance?
Yes. We are using a modified version of Tao’s classifier (edit-distance
based SVM) for the research project.

On 03/19/2014 04:25 AM, Kevin P Dyer wrote:
...
If It helps, the code [2] from our website fingerprinting paper [1] is
public. It includes the edit-distance classifier [3] from [4], which
wasn't reported on in [1], I believe.
Thanks, Kevin. In particular, we are using the Damerau-Levenshtein
distance instead of the plain Levenshtein, which also takes into account
transpositions. I think it describes better network traffic, specially
when RP is used.

On Tue, Mar 18, 2014 at 7:30 PM, Mike Perry <mikeperry@torproject.org
...
One of the ideas I've been considering is taking a closer look at the
nearest-neighbor edit distances between page class labels, for the edit
distance based classifiers. This distance provides us with an estimate
of the ideal minimum cover traffic we will need to make testing
instances jump from one nearest-neighbor label to another (causing a
false positive). It will also decrease as the world size increases (more
class labels in the same amount of N-dimensional space).
A successful defense should change of the distribution of edit distances
of test instances around their class labels (it will increase the
intra-class variance) and this in turn will increase the size of the
threshold around class labels for a given accuracy rate, reducing
accuracy or increasing false positives.
Yes, that sounds very interesting. I guess something relevant in such a
scheme is the distribution used to pick the web page in the k-NN cluster
that will be used to generate the cover traffic. It would be interesting
to see what is the distribution that minimizes the overhead introduced
by this cover traffic while preventing statistical attacks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTKXj1AAoJEGfJ5xfgazlxru4IAKPVJ2iCfEZCvV9g7ufTpQff
hcaav5taSme2bvB8rdMYsr70sHzqJL/3XwJ05duf7G/bCYhuHrJvq6Ci6GAPhe0q
rDUMzkQOXLoZ9GvhzEbb+N/3EXahsW0W/mUjSxBvfxQLqk4I/13x6LhkSRvs7Ibv
1dj4D32VVPZF5kWTvztKZoFXtqkt6LTZDkak1j1vN2h5Vdriu7NwERiGBYGBIl1l
LW44vPGu5qCxg/VPUIki0Te6LLqYLEfcUBhVTOBThE2RmBu4iCiivBi9seWeqXL8
/sh5Fgr2NnndrwYG0FDDY8GTJJ/unKJCqN2cX6WgNfox/2N43xAq3es6732Igew=
=07zn
-----END PGP SIGNATURE-----

Re: [tor-dev] GoSC - Website Fingerprinting project

Marc Juarez