[tor-dev] GSoC'16 proposal: the Torprinter project (a Panopticlick-like website)
atagar at torproject.org
Wed Mar 16 16:06:23 UTC 2016
Hi Pierre, on first glance looks like a nice proposal. Just a heads up
though that to be considered there needs to be a prospective mentor
for this project. Reaching out to tor-dev@ is a great first step and
hopefully it'll do the trick, but if it doesn't try asking on the
#tor-dev irc channel.
On Tue, Mar 15, 2016 at 1:46 AM, Pierre Laperdrix
<pierre.laperdrix at irisa.fr> wrote:
> Hi Tor Community,
> My name is Pierre and I'm really interested in participating in a GSoC
> project this year with the Tor organization. Since I've been working on
> browser fingerprinting for the past two years, I'd love to build a
> Panopticlick-like website to improve the fingerprinting defenses of the
> Tor browser.
> I've included below my proposal in case anyone has ideas or suggestions,
> especially on the technical section or on some of the open questions
> that I have. (It should be noted that the Torprinter name is subject to
> Summary - The Torprinter project: a browser fingerprinting website to
> improve Tor fingerprinting defenses
> The capabilities of browser fingerprinting as a tool to track users
> online has been demonstrated by Panopticlick and other research papers
> since 2010. The Tor community is fully aware of the problem and the Tor
> browser has been modified to follow the "one fingerprint for all"
> approach. Spoofing HTTP headers, removing plugins, including bundled
> fonts, preventing canvas image extraction: these are a few examples of
> the progress made by Tor developers to protect their users against such
> threat. However, due to the constant evolution of the web and its
> underlying technologies, it has become a true challenge to always stay
> ahead of the latest fingerprinting techniques.
> I'm deeply interested in privacy and I've been studying browser
> fingerprinting for the past 2 years. I've launched 18 months ago the
> AmIUnique.org website to investigate the latest fingerprinting
> techniques. Collecting data on thousands of devices is one of the keys
> to understand and counter the fingerprinting problem.
> For this Google Summer of Code project, I propose to develop the
> Torprinter website that will run a fingerprinting test suite and collect
> data from Tor browsers to help developers design and test new defenses
> against browser fingerprinting. The website will be similar to AmIUnique
> or Panopticlick for users where they will get a complete summary with
> statistics after the test suite has been executed. It can be used to
> test new fingerprinting protection as well as making sure that
> fingerprinting-related bugs were correctly fixed with specific
> regression tests. The expected long-term impact of this project is to
> reduce the differences between Tor users and reinforce their privacy and
> anonymity online. In a second step, the website could open its doors to
> more browsers so that it could become a platform where vendors can
> implement significant changes in their browsers with regards to privacy
> and see the impact first-hand on the website. With the strong expertise
> I have acquired on the fingerprinting subject and the experience I have
> gained by developing the AmIUnique website, I believe I'm fully
> qualified to see such a project through to completion.
> Website features
> The main feature of the website is to collect a set of fingerprintable
> attributes on the client and calculate the distribution of values for
> each attribute like Panopticlick or AmIUnique. The set of tests would
> not only include known fingerprinting techniques but also ones developed
> specifically for the Tor browser.
> The second main feature of the website would be for Tor users to check
> how close their current fingerprint is from the ideal unique fingerprint
> that most users should share. A list of actions should be added to help
> users configure their browser to reach this ideal fingerprint.
> The third main feature would be an API for automated tests as detailed
> by this page :
> . This would enable automatic verification of Tor protection features
> with regard to fingerprinting. When a new version is released, the
> output of specific tests will be verified to check for any
> evolution/changes/regressions from previous versions.
> The fourth main feature I'd like to include is a complete stats page
> where the user can go through every attribute and filter by OS, browser
> version and more.
> The inclusion of additional features that go beyond the core
> functionnalities of the site should be driven by the needs of the
> developers and the Tor community.
> Still, a lot of open questions remain that should be addressed during
> the bonding period to define precisely how each of these features should
> ultimately work.
> Some of these open questions include:
> - How closed/private/transparent should the website be about its tests
> and the results? Should every tests be clearly indicated on the webpage
> with their own description? or should some tests stay hidden to prevent
> spreading usable tests to fingerprint Tor users?
> - Should a statistics page exist? Should we give a read access to the
> database to every user (like in the form of a REST API or other solutions)?
> - Where the data should be stored? How long should the data be kept? If
> tests are performed by versions, should the data from an old TBB version
> be removed? Should the data be kept a week, a month or more?
> - How new tests should be added: A pull request? A form where
> submissions are reviewed by admins? A link to the Tor tracker?
> - Should the website only be accessible through Tor?
> Technical choices
> In my opinion, the website must be accessible and modular. It should
> have the ability to cope with an important number of connections/data.
> With this in mind and the experience gained from developing AmIUnique, I
> plan on using the Play framework with a MongoDB database. Developing the
> website in Java opens the door to many developers to make the website
> better and more robust after its initial launch since it is one of most
> used programming language in the world. On the storage and statistics
> side, MongoDB is a good fit because it is now a mature technology that
> can scale well with an important number of data and connections.
> Moreover, the use of SQL databases for AmIUnique proved to be really
> powerful but the maintenance after the website was launched became a
> tedious task, especially when modifying the underlying model of a
> fingerprint to collect new attributes. The choice of a more flexible and
> modular database seems a better choice for maintenance and for
> adding/removing tests.
> Estimated timeline
> You will find below a rough estimate of the timeline for the three
> months of the GSoC.
> Community bonding period - Discuss with the mentors and the community
> the set of features that should be included in the very first version of
> the website and clarify the open questions raised in one of the previous
> 23 May - 27 June : Development of the first version of the website with
> the core features
> Week 1 - Development of the first version of the fingerprinting
> script with the core set of attributes. Special attention will be given
> so that it is fully compatible with the most recent version of the Tor
> browser (and older ones too).
> Week 2 - Start developing the front-end and the back-end to store
> fingerprints with a page containing data on your current fingerprint
> (try adding a view to see how close/far you are from the ideal fingerprint).
> Week 3 - Start developing the statistics page with the necessary
> visualization for the users. Modification of the back-end to improve
> statistics computation to lessen the server load.
> Week 4 - Finishing the front-end development and refining the
> statistics page to get back the most relevant information.
> Adding and testing an API to support automated tests.
> Week 5 - Finishing the first version so that it is ready for deployment.
> Start developing additional features requested by the community
> (rest API? account management?)
> 27 June - Mid July :
> Deployment of the first version online for a beta-test with bug fixing.
> Finishing development of additional features requested by the
> Defining the list of new features for the second version.
> Mid July - 23th August :
> Adding a system to make the website as flexible as possible to
> add/remove tests easily (A pull-request system? A test submission form
> where admins review tests before they are included in the test suite?)
> Developing additional features for the website.
> Making sure that the website can be opened to more browsers (work done
> at design time to support any browsers will be tested here)
> Bug fixing
> Code sample
> In 2014, I developed the entire AmIUnique.org website from scratch. Its
> aim is to collect fingerprints to study the current diversity of
> fingerprints on the Internet while providing full details to users on
> this subject. It was the first time that I built a complete website from
> the design phase to its deployment online.
> One of the first challenge that I encountered was to build a script that
> would not only use state-of-the-art techniques but that could simply
> work on the widest variety of browsers. Testing a script for a recent
> version of a major browser like Chrome and Firefox is an easy task since
> sure that the script runs correctly on older browsers like Internet
> Explorer is another story. Juggling with a dozen different virtual
> machines was necessary to obtain a bug-free and stable version of the
> script. A small beta-test was required to make sure that everything was
> good to go for what is now the foundations of the AmIUnique website. The
> totality of the source code for AmIUnique and my other projects can be
> found on GitHub.
> A second challenge that I faced was to deal with the increasing load of
> users so that the server could return personalized statistics to
> visitors in a timely manner (less than 2/3s). By having a separate
> entity that updates statistics in real time on top of the database, I
> managed to drastically reduce the server load. With the number of Tor
> users around the world, the website needs from the get go to handle a
> high load of visitors and statistics computation and my previous
> experience on that specific task will prove useful.
> For the very first version of Torprinter, I plan on testing well-known
> and widespread fingerprinting techniques to make sure that there is no
> variation among Tor users. These include HTTP headers and known
> since plugins are not present in the Tor browser (thus removing complex
> code in charge of correctly loading the Flash object).
> For this proposal, I have also developed a special page with 7 different
> tests that are mainly targeted at the Tor browser to give an idea of
> what tests can be included that are more suited to the Tor users.
> Tests n°5, n°6 and n°7 are broader and also concerns the Firefox browser.
> You can found a working version of the script on a special webpage (need
> to scroll to make the results appear):
> The script can be found here: https://plaperdr.github.io/assets/tor/tor.js
> Test n°1
> Test the size of the current window - As reported by ticket n°14098
> Test n°2
> Test the support of emoji - As reported by ticket n°18172
> Test n°3
> Analysis of the "scroll" behavior of the window - As investiagted by
> Test n°4
> Test the size of current fallback font by using the canvas API to render
> some text (no need for user permission like canvas extraction) - Custom test
> Test n°5
> Test the difference between OS on the maximum font size - Custom test
> Test n°6
> Test the difference between OS on the Date API - As reported by ticket
> n°15473 https://trac.torproject.org/projects/tor/ticket/15473
> Test n°7
> Test the difference between OS on the Math class - As reported by ticket
> n° 13018 https://trac.torproject.org/projects/tor/ticket/13018
> Any remarks, suggestions or ideas are very welcome!
> tor-dev mailing list
> tor-dev at lists.torproject.org
More information about the tor-dev