GSoC'16 proposal: the Torprinter project (a Panopticlick-like website)

15 Mar 2016

      Hi Tor Community,

My name is Pierre and I'm really interested in participating in a GSoC
project this year with the Tor organization. Since I've been working on
browser fingerprinting for the past two years, I'd love to build a
Panopticlick-like website to improve the fingerprinting defenses of the
Tor browser.

I've included below my proposal in case anyone has ideas or suggestions,
especially on the technical section or on some of the open questions
that I have. (It should be noted that the Torprinter name is subject to
change).

******************************************************

Summary - The Torprinter project: a browser fingerprinting website to
improve Tor fingerprinting defenses
The capabilities of browser fingerprinting as a tool to track users
online has been demonstrated by Panopticlick and other research papers
since 2010. The Tor community is fully aware of the problem and the Tor
browser has been modified to follow the "one fingerprint for all"
approach. Spoofing HTTP headers, removing plugins, including bundled
fonts, preventing canvas image extraction: these are a few examples of
the progress made by Tor developers to protect their users against such
threat. However, due to the constant evolution of the web and its
underlying technologies, it has become a true challenge to always stay
ahead of the latest fingerprinting techniques.
I'm deeply interested in privacy and I've been studying browser
fingerprinting for the past 2 years. I've launched 18 months ago the
AmIUnique.org website to investigate the latest fingerprinting
techniques. Collecting data on thousands of devices is one of the keys
to understand and counter the fingerprinting problem.
For this Google Summer of Code project, I propose to develop the
Torprinter website that will run a fingerprinting test suite and collect
data from Tor browsers to help developers design and test new defenses
against browser fingerprinting. The website will be similar to AmIUnique
or Panopticlick for users where they will get a complete summary with
statistics after the test suite has been executed. It can be used to
test new fingerprinting protection as well as making sure that
fingerprinting-related bugs were correctly fixed with specific
regression tests. The expected long-term impact of this project is to
reduce the differences between Tor users and reinforce their privacy and
anonymity online. In a second step, the website could open its doors to
more browsers so that it could become a platform where vendors can
implement significant changes in their browsers with regards to privacy
and see the impact first-hand on the website. With the strong expertise
I have acquired on the fingerprinting subject and the experience I have
gained by developing the AmIUnique website, I believe I'm fully
qualified to see such a project through to completion.

Website features
The main feature of the website is to collect a set of fingerprintable
attributes on the client and calculate the distribution of values for
each attribute like Panopticlick or AmIUnique. The set of tests would
not only include known fingerprinting techniques but also ones developed
specifically for the Tor browser.
The second main feature of the website would be for Tor users to check
how close their current fingerprint is from the ideal unique fingerprint
that most users should share. A list of actions should be added to help
users configure their browser to reach this ideal fingerprint.
The third main feature would be an API for automated tests as detailed
by this page :
https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm...
. This would enable automatic verification of Tor protection features
with regard to fingerprinting. When a new version is released, the
output of specific tests will be verified to check for any
evolution/changes/regressions from previous versions.
The fourth main feature I'd like to include is a complete stats page
where the user can go through every attribute and filter by OS, browser
version and more.
The inclusion of additional features that go beyond the core
functionnalities of the site should be driven by the needs of the
developers and the Tor community.
Still, a lot of open questions remain that should be addressed during
the bonding period to define precisely how each of these features should
ultimately work.
Some of these open questions include:
- How closed/private/transparent should the website be about its tests
and the results? Should every tests be clearly indicated on the webpage
with their own description? or should some tests stay hidden to prevent
spreading usable tests to fingerprint Tor users?
- Should a statistics page exist? Should we give a read access to the
database to every user (like in the form of a REST API or other solutions)?
- Where the data should be stored? How long should the data be kept? If
tests are performed by versions, should the data from an old TBB version
be removed? Should the data be kept a week, a month or more?
- How new tests should be added: A pull request? A form where
submissions are reviewed by admins? A link to the Tor tracker?
- Should the website only be accessible through Tor?

Technical choices
In my opinion, the website must be accessible and modular. It should
have the ability to cope with an important number of connections/data.
With this in mind and the experience gained from developing AmIUnique, I
plan on using the Play framework with a MongoDB database. Developing the
website in Java opens the door to many developers to make the website
better and more robust after its initial launch since it is one of most
used programming language in the world. On the storage and statistics
side, MongoDB is a good fit because it is now a mature technology that
can scale well with an important number of data and connections.
Moreover, the use of SQL databases for AmIUnique proved to be really
powerful but the maintenance after the website was launched became a
tedious task, especially when modifying the underlying model of a
fingerprint to collect new attributes. The choice of a more flexible and
modular database seems a better choice for maintenance and for
adding/removing tests.

Estimated timeline
You will find below a rough estimate of the timeline for the three
months of the GSoC.

Community bonding period - Discuss with the mentors and the community
the set of features that should be included in the very first version of
the website and clarify the open questions raised in one of the previous
paragraphs.

23 May - 27 June : Development of the first version of the website with
the core features
    Week 1 - Development of the first version of the fingerprinting
script with the core set of attributes. Special attention will be given
so that it is fully compatible with the most recent version of the Tor
browser (and older ones too).
    Week 2 - Start developing the front-end and the back-end to store
fingerprints with a page containing data on your current fingerprint
(try adding a view to see how close/far you are from the ideal fingerprint).
    Week 3 - Start developing the statistics page with the necessary
visualization for the users. Modification of the back-end to improve
statistics computation to lessen the server load.
    Week 4 - Finishing the front-end development and refining the
statistics page to get back the most relevant information.
           Adding and testing an API to support automated tests.
    Week 5 - Finishing the first version so that it is ready for deployment.
        Start developing additional features requested by the community
(rest API? account management?)

27 June - Mid July :
Deployment of the first version online for a beta-test with bug fixing.
Finishing development of additional features requested by the
mentors/community.
Defining the list of new features for the second version.

Mid July - 23th August :
Adding a system to make the website as flexible as possible to
add/remove tests easily (A pull-request system? A test submission form
where admins review tests before they are included in the test suite?)
Developing additional features for the website.
Making sure that the website can be opened to more browsers (work done
at design time to support any browsers will be tested here)
Bug fixing

Code sample
In 2014, I developed the entire AmIUnique.org website from scratch. Its
aim is to collect fingerprints to study the current diversity of
fingerprints on the Internet while providing full details to users on
this subject. It was the first time that I built a complete website from
the design phase to its deployment online.
One of the first challenge that I encountered was to build a script that
would not only use state-of-the-art techniques but that could simply
work on the widest variety of browsers. Testing a script for a recent
version of a major browser like Chrome and Firefox is an easy task since
they implement the latest HTML and JavaScript technologies but making
sure that the script runs correctly on older browsers like Internet
Explorer is another story. Juggling with a dozen different virtual
machines was necessary to obtain a bug-free and stable version of the
script. A small beta-test was required to make sure that everything was
good to go for what is now the foundations of the AmIUnique website. The
totality of the source code for AmIUnique and my other projects can be
found on GitHub.
A second challenge that I faced was to deal with the increasing load of
users so that the server could return personalized statistics to
visitors in a timely manner (less than 2/3s). By having a separate
entity that updates statistics in real time on top of the database, I
managed to drastically reduce the server load. With the number of Tor
users around the world, the website needs from the get go to handle a
high load of visitors and statistics computation and my previous
experience on that specific task will prove useful.

For the very first version of Torprinter, I plan on testing well-known
and widespread fingerprinting techniques to make sure that there is no
variation among Tor users. These include HTTP headers and known
JavaScript objects. There should be no need for any Flash attributes
since plugins are not present in the Tor browser (thus removing complex
code in charge of correctly loading the Flash object).
For this proposal, I have also developed a special page with 7 different
tests that are mainly targeted at the Tor browser to give an idea of
what tests can be included that are more suited to the Tor users.
Tests n°5, n°6 and n°7 are broader and also concerns the Firefox browser.
You can found a working version of the script on a special webpage (need
to scroll to make the results appear):
https://plaperdr.github.io/torScript.html
The script can be found here: https://plaperdr.github.io/assets/tor/tor.js

Test n°1
Test the size of the current window - As reported by ticket n°14098
https://trac.torproject.org/projects/tor/ticket/14098
Test n°2
Test the support of emoji - As reported by ticket n°18172
https://trac.torproject.org/projects/tor/ticket/18172
Test n°3
Analysis of the "scroll" behavior of the window - As investiagted by
http://jcarlosnorte.com/security/2016/03/06/advanced-tor-browser-fingerprint...
Test n°4
Test the size of current fallback font by using the canvas API to render
some text (no need for user permission like canvas extraction) - Custom test
Test n°5
Test the difference between OS on the maximum font size - Custom test
Test n°6
Test the difference between OS on the Date API - As reported by ticket
n°15473 https://trac.torproject.org/projects/tor/ticket/15473
Test n°7
Test the difference between OS on the Math class - As reported by ticket
n° 13018 https://trac.torproject.org/projects/tor/ticket/13018

******************************************************

Any remarks, suggestions or ideas are very welcome!
Pierre

Pierre Laperdrix

gunes acar

Pierre Laperdrix

gunes acar

Pierre Laperdrix

Damian Johnson

Damian Johnson

Georg Koppen

Pierre Laperdrix

Georg Koppen

Lunar

Pierre Laperdrix

Lunar

Georg Koppen

Pierre Laperdrix

tags

participants (5)