Hi Tor Community,
My name is Pierre and I'm really interested in participating in a GSoC project this year with the Tor organization. Since I've been working on browser fingerprinting for the past two years, I'd love to build a Panopticlick-like website to improve the fingerprinting defenses of the Tor browser.
I've included below my proposal in case anyone has ideas or suggestions, especially on the technical section or on some of the open questions that I have. (It should be noted that the Torprinter name is subject to change).
******************************************************
Summary - The Torprinter project: a browser fingerprinting website to improve Tor fingerprinting defenses The capabilities of browser fingerprinting as a tool to track users online has been demonstrated by Panopticlick and other research papers since 2010. The Tor community is fully aware of the problem and the Tor browser has been modified to follow the "one fingerprint for all" approach. Spoofing HTTP headers, removing plugins, including bundled fonts, preventing canvas image extraction: these are a few examples of the progress made by Tor developers to protect their users against such threat. However, due to the constant evolution of the web and its underlying technologies, it has become a true challenge to always stay ahead of the latest fingerprinting techniques. I'm deeply interested in privacy and I've been studying browser fingerprinting for the past 2 years. I've launched 18 months ago the AmIUnique.org website to investigate the latest fingerprinting techniques. Collecting data on thousands of devices is one of the keys to understand and counter the fingerprinting problem. For this Google Summer of Code project, I propose to develop the Torprinter website that will run a fingerprinting test suite and collect data from Tor browsers to help developers design and test new defenses against browser fingerprinting. The website will be similar to AmIUnique or Panopticlick for users where they will get a complete summary with statistics after the test suite has been executed. It can be used to test new fingerprinting protection as well as making sure that fingerprinting-related bugs were correctly fixed with specific regression tests. The expected long-term impact of this project is to reduce the differences between Tor users and reinforce their privacy and anonymity online. In a second step, the website could open its doors to more browsers so that it could become a platform where vendors can implement significant changes in their browsers with regards to privacy and see the impact first-hand on the website. With the strong expertise I have acquired on the fingerprinting subject and the experience I have gained by developing the AmIUnique website, I believe I'm fully qualified to see such a project through to completion.
Website features The main feature of the website is to collect a set of fingerprintable attributes on the client and calculate the distribution of values for each attribute like Panopticlick or AmIUnique. The set of tests would not only include known fingerprinting techniques but also ones developed specifically for the Tor browser. The second main feature of the website would be for Tor users to check how close their current fingerprint is from the ideal unique fingerprint that most users should share. A list of actions should be added to help users configure their browser to reach this ideal fingerprint. The third main feature would be an API for automated tests as detailed by this page : https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm... . This would enable automatic verification of Tor protection features with regard to fingerprinting. When a new version is released, the output of specific tests will be verified to check for any evolution/changes/regressions from previous versions. The fourth main feature I'd like to include is a complete stats page where the user can go through every attribute and filter by OS, browser version and more. The inclusion of additional features that go beyond the core functionnalities of the site should be driven by the needs of the developers and the Tor community. Still, a lot of open questions remain that should be addressed during the bonding period to define precisely how each of these features should ultimately work. Some of these open questions include: - How closed/private/transparent should the website be about its tests and the results? Should every tests be clearly indicated on the webpage with their own description? or should some tests stay hidden to prevent spreading usable tests to fingerprint Tor users? - Should a statistics page exist? Should we give a read access to the database to every user (like in the form of a REST API or other solutions)? - Where the data should be stored? How long should the data be kept? If tests are performed by versions, should the data from an old TBB version be removed? Should the data be kept a week, a month or more? - How new tests should be added: A pull request? A form where submissions are reviewed by admins? A link to the Tor tracker? - Should the website only be accessible through Tor?
Technical choices In my opinion, the website must be accessible and modular. It should have the ability to cope with an important number of connections/data. With this in mind and the experience gained from developing AmIUnique, I plan on using the Play framework with a MongoDB database. Developing the website in Java opens the door to many developers to make the website better and more robust after its initial launch since it is one of most used programming language in the world. On the storage and statistics side, MongoDB is a good fit because it is now a mature technology that can scale well with an important number of data and connections. Moreover, the use of SQL databases for AmIUnique proved to be really powerful but the maintenance after the website was launched became a tedious task, especially when modifying the underlying model of a fingerprint to collect new attributes. The choice of a more flexible and modular database seems a better choice for maintenance and for adding/removing tests.
Estimated timeline You will find below a rough estimate of the timeline for the three months of the GSoC.
Community bonding period - Discuss with the mentors and the community the set of features that should be included in the very first version of the website and clarify the open questions raised in one of the previous paragraphs.
23 May - 27 June : Development of the first version of the website with the core features Week 1 - Development of the first version of the fingerprinting script with the core set of attributes. Special attention will be given so that it is fully compatible with the most recent version of the Tor browser (and older ones too). Week 2 - Start developing the front-end and the back-end to store fingerprints with a page containing data on your current fingerprint (try adding a view to see how close/far you are from the ideal fingerprint). Week 3 - Start developing the statistics page with the necessary visualization for the users. Modification of the back-end to improve statistics computation to lessen the server load. Week 4 - Finishing the front-end development and refining the statistics page to get back the most relevant information. Adding and testing an API to support automated tests. Week 5 - Finishing the first version so that it is ready for deployment. Start developing additional features requested by the community (rest API? account management?)
27 June - Mid July : Deployment of the first version online for a beta-test with bug fixing. Finishing development of additional features requested by the mentors/community. Defining the list of new features for the second version.
Mid July - 23th August : Adding a system to make the website as flexible as possible to add/remove tests easily (A pull-request system? A test submission form where admins review tests before they are included in the test suite?) Developing additional features for the website. Making sure that the website can be opened to more browsers (work done at design time to support any browsers will be tested here) Bug fixing
Code sample In 2014, I developed the entire AmIUnique.org website from scratch. Its aim is to collect fingerprints to study the current diversity of fingerprints on the Internet while providing full details to users on this subject. It was the first time that I built a complete website from the design phase to its deployment online. One of the first challenge that I encountered was to build a script that would not only use state-of-the-art techniques but that could simply work on the widest variety of browsers. Testing a script for a recent version of a major browser like Chrome and Firefox is an easy task since they implement the latest HTML and JavaScript technologies but making sure that the script runs correctly on older browsers like Internet Explorer is another story. Juggling with a dozen different virtual machines was necessary to obtain a bug-free and stable version of the script. A small beta-test was required to make sure that everything was good to go for what is now the foundations of the AmIUnique website. The totality of the source code for AmIUnique and my other projects can be found on GitHub. A second challenge that I faced was to deal with the increasing load of users so that the server could return personalized statistics to visitors in a timely manner (less than 2/3s). By having a separate entity that updates statistics in real time on top of the database, I managed to drastically reduce the server load. With the number of Tor users around the world, the website needs from the get go to handle a high load of visitors and statistics computation and my previous experience on that specific task will prove useful.
For the very first version of Torprinter, I plan on testing well-known and widespread fingerprinting techniques to make sure that there is no variation among Tor users. These include HTTP headers and known JavaScript objects. There should be no need for any Flash attributes since plugins are not present in the Tor browser (thus removing complex code in charge of correctly loading the Flash object). For this proposal, I have also developed a special page with 7 different tests that are mainly targeted at the Tor browser to give an idea of what tests can be included that are more suited to the Tor users. Tests n°5, n°6 and n°7 are broader and also concerns the Firefox browser. You can found a working version of the script on a special webpage (need to scroll to make the results appear): https://plaperdr.github.io/torScript.html The script can be found here: https://plaperdr.github.io/assets/tor/tor.js
Test n°1 Test the size of the current window - As reported by ticket n°14098 https://trac.torproject.org/projects/tor/ticket/14098 Test n°2 Test the support of emoji - As reported by ticket n°18172 https://trac.torproject.org/projects/tor/ticket/18172 Test n°3 Analysis of the "scroll" behavior of the window - As investiagted by http://jcarlosnorte.com/security/2016/03/06/advanced-tor-browser-fingerprint... Test n°4 Test the size of current fallback font by using the canvas API to render some text (no need for user permission like canvas extraction) - Custom test Test n°5 Test the difference between OS on the maximum font size - Custom test Test n°6 Test the difference between OS on the Date API - As reported by ticket n°15473 https://trac.torproject.org/projects/tor/ticket/15473 Test n°7 Test the difference between OS on the Math class - As reported by ticket n° 13018 https://trac.torproject.org/projects/tor/ticket/13018
******************************************************
Any remarks, suggestions or ideas are very welcome! Pierre