Hi Tor Community,
My name is Pierre and I'm really interested in participating in a GSoC project this year with the Tor organization. Since I've been working on browser fingerprinting for the past two years, I'd love to build a Panopticlick-like website to improve the fingerprinting defenses of the Tor browser.
I've included below my proposal in case anyone has ideas or suggestions, especially on the technical section or on some of the open questions that I have. (It should be noted that the Torprinter name is subject to change).
******************************************************
Summary - The Torprinter project: a browser fingerprinting website to improve Tor fingerprinting defenses The capabilities of browser fingerprinting as a tool to track users online has been demonstrated by Panopticlick and other research papers since 2010. The Tor community is fully aware of the problem and the Tor browser has been modified to follow the "one fingerprint for all" approach. Spoofing HTTP headers, removing plugins, including bundled fonts, preventing canvas image extraction: these are a few examples of the progress made by Tor developers to protect their users against such threat. However, due to the constant evolution of the web and its underlying technologies, it has become a true challenge to always stay ahead of the latest fingerprinting techniques. I'm deeply interested in privacy and I've been studying browser fingerprinting for the past 2 years. I've launched 18 months ago the AmIUnique.org website to investigate the latest fingerprinting techniques. Collecting data on thousands of devices is one of the keys to understand and counter the fingerprinting problem. For this Google Summer of Code project, I propose to develop the Torprinter website that will run a fingerprinting test suite and collect data from Tor browsers to help developers design and test new defenses against browser fingerprinting. The website will be similar to AmIUnique or Panopticlick for users where they will get a complete summary with statistics after the test suite has been executed. It can be used to test new fingerprinting protection as well as making sure that fingerprinting-related bugs were correctly fixed with specific regression tests. The expected long-term impact of this project is to reduce the differences between Tor users and reinforce their privacy and anonymity online. In a second step, the website could open its doors to more browsers so that it could become a platform where vendors can implement significant changes in their browsers with regards to privacy and see the impact first-hand on the website. With the strong expertise I have acquired on the fingerprinting subject and the experience I have gained by developing the AmIUnique website, I believe I'm fully qualified to see such a project through to completion.
Website features The main feature of the website is to collect a set of fingerprintable attributes on the client and calculate the distribution of values for each attribute like Panopticlick or AmIUnique. The set of tests would not only include known fingerprinting techniques but also ones developed specifically for the Tor browser. The second main feature of the website would be for Tor users to check how close their current fingerprint is from the ideal unique fingerprint that most users should share. A list of actions should be added to help users configure their browser to reach this ideal fingerprint. The third main feature would be an API for automated tests as detailed by this page : https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm... . This would enable automatic verification of Tor protection features with regard to fingerprinting. When a new version is released, the output of specific tests will be verified to check for any evolution/changes/regressions from previous versions. The fourth main feature I'd like to include is a complete stats page where the user can go through every attribute and filter by OS, browser version and more. The inclusion of additional features that go beyond the core functionnalities of the site should be driven by the needs of the developers and the Tor community. Still, a lot of open questions remain that should be addressed during the bonding period to define precisely how each of these features should ultimately work. Some of these open questions include: - How closed/private/transparent should the website be about its tests and the results? Should every tests be clearly indicated on the webpage with their own description? or should some tests stay hidden to prevent spreading usable tests to fingerprint Tor users? - Should a statistics page exist? Should we give a read access to the database to every user (like in the form of a REST API or other solutions)? - Where the data should be stored? How long should the data be kept? If tests are performed by versions, should the data from an old TBB version be removed? Should the data be kept a week, a month or more? - How new tests should be added: A pull request? A form where submissions are reviewed by admins? A link to the Tor tracker? - Should the website only be accessible through Tor?
Technical choices In my opinion, the website must be accessible and modular. It should have the ability to cope with an important number of connections/data. With this in mind and the experience gained from developing AmIUnique, I plan on using the Play framework with a MongoDB database. Developing the website in Java opens the door to many developers to make the website better and more robust after its initial launch since it is one of most used programming language in the world. On the storage and statistics side, MongoDB is a good fit because it is now a mature technology that can scale well with an important number of data and connections. Moreover, the use of SQL databases for AmIUnique proved to be really powerful but the maintenance after the website was launched became a tedious task, especially when modifying the underlying model of a fingerprint to collect new attributes. The choice of a more flexible and modular database seems a better choice for maintenance and for adding/removing tests.
Estimated timeline You will find below a rough estimate of the timeline for the three months of the GSoC.
Community bonding period - Discuss with the mentors and the community the set of features that should be included in the very first version of the website and clarify the open questions raised in one of the previous paragraphs.
23 May - 27 June : Development of the first version of the website with the core features Week 1 - Development of the first version of the fingerprinting script with the core set of attributes. Special attention will be given so that it is fully compatible with the most recent version of the Tor browser (and older ones too). Week 2 - Start developing the front-end and the back-end to store fingerprints with a page containing data on your current fingerprint (try adding a view to see how close/far you are from the ideal fingerprint). Week 3 - Start developing the statistics page with the necessary visualization for the users. Modification of the back-end to improve statistics computation to lessen the server load. Week 4 - Finishing the front-end development and refining the statistics page to get back the most relevant information. Adding and testing an API to support automated tests. Week 5 - Finishing the first version so that it is ready for deployment. Start developing additional features requested by the community (rest API? account management?)
27 June - Mid July : Deployment of the first version online for a beta-test with bug fixing. Finishing development of additional features requested by the mentors/community. Defining the list of new features for the second version.
Mid July - 23th August : Adding a system to make the website as flexible as possible to add/remove tests easily (A pull-request system? A test submission form where admins review tests before they are included in the test suite?) Developing additional features for the website. Making sure that the website can be opened to more browsers (work done at design time to support any browsers will be tested here) Bug fixing
Code sample In 2014, I developed the entire AmIUnique.org website from scratch. Its aim is to collect fingerprints to study the current diversity of fingerprints on the Internet while providing full details to users on this subject. It was the first time that I built a complete website from the design phase to its deployment online. One of the first challenge that I encountered was to build a script that would not only use state-of-the-art techniques but that could simply work on the widest variety of browsers. Testing a script for a recent version of a major browser like Chrome and Firefox is an easy task since they implement the latest HTML and JavaScript technologies but making sure that the script runs correctly on older browsers like Internet Explorer is another story. Juggling with a dozen different virtual machines was necessary to obtain a bug-free and stable version of the script. A small beta-test was required to make sure that everything was good to go for what is now the foundations of the AmIUnique website. The totality of the source code for AmIUnique and my other projects can be found on GitHub. A second challenge that I faced was to deal with the increasing load of users so that the server could return personalized statistics to visitors in a timely manner (less than 2/3s). By having a separate entity that updates statistics in real time on top of the database, I managed to drastically reduce the server load. With the number of Tor users around the world, the website needs from the get go to handle a high load of visitors and statistics computation and my previous experience on that specific task will prove useful.
For the very first version of Torprinter, I plan on testing well-known and widespread fingerprinting techniques to make sure that there is no variation among Tor users. These include HTTP headers and known JavaScript objects. There should be no need for any Flash attributes since plugins are not present in the Tor browser (thus removing complex code in charge of correctly loading the Flash object). For this proposal, I have also developed a special page with 7 different tests that are mainly targeted at the Tor browser to give an idea of what tests can be included that are more suited to the Tor users. Tests n°5, n°6 and n°7 are broader and also concerns the Firefox browser. You can found a working version of the script on a special webpage (need to scroll to make the results appear): https://plaperdr.github.io/torScript.html The script can be found here: https://plaperdr.github.io/assets/tor/tor.js
Test n°1 Test the size of the current window - As reported by ticket n°14098 https://trac.torproject.org/projects/tor/ticket/14098 Test n°2 Test the support of emoji - As reported by ticket n°18172 https://trac.torproject.org/projects/tor/ticket/18172 Test n°3 Analysis of the "scroll" behavior of the window - As investiagted by http://jcarlosnorte.com/security/2016/03/06/advanced-tor-browser-fingerprint... Test n°4 Test the size of current fallback font by using the canvas API to render some text (no need for user permission like canvas extraction) - Custom test Test n°5 Test the difference between OS on the maximum font size - Custom test Test n°6 Test the difference between OS on the Date API - As reported by ticket n°15473 https://trac.torproject.org/projects/tor/ticket/15473 Test n°7 Test the difference between OS on the Math class - As reported by ticket n° 13018 https://trac.torproject.org/projects/tor/ticket/13018
******************************************************
Any remarks, suggestions or ideas are very welcome! Pierre
Hi Pierre,
Thanks for the very well thought proposal!
I'm curious about your ideas on the "returning device problem." EFF's Panopticlick and AmIUnique.org use a combination of cookies and IP address to recognize returning users - so that their fingerprints are not "double-counted."
Since these signals will not available anymore (unless the user opt-ins to retain the cookie), I wonder what'd be your ideas to address this issue.
Please find other responses below.
Best, Gunes
On 2016-03-15 04:46, Pierre Laperdrix wrote:
Hi Tor Community,
My name is Pierre and I'm really interested in participating in a GSoC project this year with the Tor organization. Since I've been working on browser fingerprinting for the past two years, I'd love to build a Panopticlick-like website to improve the fingerprinting defenses of the Tor browser.
I've included below my proposal in case anyone has ideas or suggestions, especially on the technical section or on some of the open questions that I have. (It should be noted that the Torprinter name is subject to change).
Summary - The Torprinter project: a browser fingerprinting website to improve Tor fingerprinting defenses The capabilities of browser fingerprinting as a tool to track users online has been demonstrated by Panopticlick and other research papers since 2010. The Tor community is fully aware of the problem and the Tor browser has been modified to follow the "one fingerprint for all" approach. Spoofing HTTP headers, removing plugins, including bundled fonts, preventing canvas image extraction: these are a few examples of the progress made by Tor developers to protect their users against such threat. However, due to the constant evolution of the web and its underlying technologies, it has become a true challenge to always stay ahead of the latest fingerprinting techniques. I'm deeply interested in privacy and I've been studying browser fingerprinting for the past 2 years. I've launched 18 months ago the AmIUnique.org website to investigate the latest fingerprinting techniques. Collecting data on thousands of devices is one of the keys to understand and counter the fingerprinting problem. For this Google Summer of Code project, I propose to develop the Torprinter website that will run a fingerprinting test suite and collect data from Tor browsers to help developers design and test new defenses against browser fingerprinting. The website will be similar to AmIUnique or Panopticlick for users where they will get a complete summary with statistics after the test suite has been executed. It can be used to test new fingerprinting protection as well as making sure that fingerprinting-related bugs were correctly fixed with specific regression tests. The expected long-term impact of this project is to reduce the differences between Tor users and reinforce their privacy and anonymity online. In a second step, the website could open its doors to more browsers so that it could become a platform where vendors can implement significant changes in their browsers with regards to privacy and see the impact first-hand on the website. With the strong expertise I have acquired on the fingerprinting subject and the experience I have gained by developing the AmIUnique website, I believe I'm fully qualified to see such a project through to completion.
Website features The main feature of the website is to collect a set of fingerprintable attributes on the client and calculate the distribution of values for each attribute like Panopticlick or AmIUnique. The set of tests would not only include known fingerprinting techniques but also ones developed specifically for the Tor browser. The second main feature of the website would be for Tor users to check how close their current fingerprint is from the ideal unique fingerprint that most users should share. A list of actions should be added to help users configure their browser to reach this ideal fingerprint. The third main feature would be an API for automated tests as detailed by this page : https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm... . This would enable automatic verification of Tor protection features with regard to fingerprinting. When a new version is released, the output of specific tests will be verified to check for any evolution/changes/regressions from previous versions. The fourth main feature I'd like to include is a complete stats page where the user can go through every attribute and filter by OS, browser version and more. The inclusion of additional features that go beyond the core functionnalities of the site should be driven by the needs of the developers and the Tor community. Still, a lot of open questions remain that should be addressed during the bonding period to define precisely how each of these features should ultimately work. Some of these open questions include:
- How closed/private/transparent should the website be about its tests
and the results? Should every tests be clearly indicated on the webpage with their own description? or should some tests stay hidden to prevent spreading usable tests to fingerprint Tor users?
I think the site should be transparent about the tests it runs. Perhaps the majority of the fingerprinting tests/code will run on the client side and can be easily captured by anyone with necessary skills (even if you obfuscate them).
- Should a statistics page exist? Should we give a read access to the
database to every user (like in the form of a REST API or other solutions)?
I think aggregate statistics should be available publicly but exposing individual fingerprints publicly may not be necessary.
- Where the data should be stored? How long should the data be kept? If
tests are performed by versions, should the data from an old TBB version be removed? Should the data be kept a week, a month or more?
- How new tests should be added: A pull request? A form where
submissions are reviewed by admins? A link to the Tor tracker?
- Should the website only be accessible through Tor?
Technical choices In my opinion, the website must be accessible and modular. It should have the ability to cope with an important number of connections/data. With this in mind and the experience gained from developing AmIUnique, I plan on using the Play framework with a MongoDB database. Developing the website in Java opens the door to many developers to make the website better and more robust after its initial launch since it is one of most used programming language in the world. On the storage and statistics side, MongoDB is a good fit because it is now a mature technology that can scale well with an important number of data and connections. Moreover, the use of SQL databases for AmIUnique proved to be really powerful but the maintenance after the website was launched became a tedious task, especially when modifying the underlying model of a fingerprint to collect new attributes. The choice of a more flexible and modular database seems a better choice for maintenance and for adding/removing tests.
Estimated timeline You will find below a rough estimate of the timeline for the three months of the GSoC.
Community bonding period - Discuss with the mentors and the community the set of features that should be included in the very first version of the website and clarify the open questions raised in one of the previous paragraphs.
23 May - 27 June : Development of the first version of the website with the core features Week 1 - Development of the first version of the fingerprinting script with the core set of attributes. Special attention will be given so that it is fully compatible with the most recent version of the Tor browser (and older ones too). Week 2 - Start developing the front-end and the back-end to store fingerprints with a page containing data on your current fingerprint (try adding a view to see how close/far you are from the ideal fingerprint). Week 3 - Start developing the statistics page with the necessary visualization for the users. Modification of the back-end to improve statistics computation to lessen the server load. Week 4 - Finishing the front-end development and refining the statistics page to get back the most relevant information. Adding and testing an API to support automated tests. Week 5 - Finishing the first version so that it is ready for deployment. Start developing additional features requested by the community (rest API? account management?)
27 June - Mid July : Deployment of the first version online for a beta-test with bug fixing. Finishing development of additional features requested by the mentors/community. Defining the list of new features for the second version.
Mid July - 23th August : Adding a system to make the website as flexible as possible to add/remove tests easily (A pull-request system? A test submission form where admins review tests before they are included in the test suite?) Developing additional features for the website. Making sure that the website can be opened to more browsers (work done at design time to support any browsers will be tested here) Bug fixing
Code sample In 2014, I developed the entire AmIUnique.org website from scratch. Its aim is to collect fingerprints to study the current diversity of fingerprints on the Internet while providing full details to users on this subject. It was the first time that I built a complete website from the design phase to its deployment online. One of the first challenge that I encountered was to build a script that would not only use state-of-the-art techniques but that could simply work on the widest variety of browsers. Testing a script for a recent version of a major browser like Chrome and Firefox is an easy task since they implement the latest HTML and JavaScript technologies but making sure that the script runs correctly on older browsers like Internet Explorer is another story. Juggling with a dozen different virtual machines was necessary to obtain a bug-free and stable version of the script. A small beta-test was required to make sure that everything was good to go for what is now the foundations of the AmIUnique website. The totality of the source code for AmIUnique and my other projects can be found on GitHub. A second challenge that I faced was to deal with the increasing load of users so that the server could return personalized statistics to visitors in a timely manner (less than 2/3s). By having a separate entity that updates statistics in real time on top of the database, I managed to drastically reduce the server load. With the number of Tor users around the world, the website needs from the get go to handle a high load of visitors and statistics computation and my previous experience on that specific task will prove useful.
For the very first version of Torprinter, I plan on testing well-known and widespread fingerprinting techniques to make sure that there is no variation among Tor users. These include HTTP headers and known JavaScript objects. There should be no need for any Flash attributes since plugins are not present in the Tor browser (thus removing complex code in charge of correctly loading the Flash object). For this proposal, I have also developed a special page with 7 different tests that are mainly targeted at the Tor browser to give an idea of what tests can be included that are more suited to the Tor users. Tests n°5, n°6 and n°7 are broader and also concerns the Firefox browser. You can found a working version of the script on a special webpage (need to scroll to make the results appear): https://plaperdr.github.io/torScript.html The script can be found here: https://plaperdr.github.io/assets/tor/tor.js
Test n°1 Test the size of the current window - As reported by ticket n°14098 https://trac.torproject.org/projects/tor/ticket/14098 Test n°2 Test the support of emoji - As reported by ticket n°18172 https://trac.torproject.org/projects/tor/ticket/18172 Test n°3 Analysis of the "scroll" behavior of the window - As investiagted by http://jcarlosnorte.com/security/2016/03/06/advanced-tor-browser-fingerprint... Test n°4 Test the size of current fallback font by using the canvas API to render some text (no need for user permission like canvas extraction) - Custom test Test n°5 Test the difference between OS on the maximum font size - Custom test Test n°6 Test the difference between OS on the Date API - As reported by ticket n°15473 https://trac.torproject.org/projects/tor/ticket/15473 Test n°7 Test the difference between OS on the Math class - As reported by ticket n° 13018 https://trac.torproject.org/projects/tor/ticket/13018
Any remarks, suggestions or ideas are very welcome! Pierre
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hi Gunes,
Thanks a lot for the feedback!
On 03/16/2016 03:30 PM, gunes acar wrote:
Hi Pierre,
Thanks for the very well thought proposal!
I'm curious about your ideas on the "returning device problem." EFF's Panopticlick and AmIUnique.org use a combination of cookies and IP address to recognize returning users - so that their fingerprints are not "double-counted."
Since these signals will not available anymore (unless the user opt-ins to retain the cookie), I wonder what'd be your ideas to address this issue.
This one is a really interesting question but a tricky one because we can't really rely on the cookies+IP combination with the Tor browser. My answer here is simple: it all depends on the goal we set for the website.
Do we want to learn how many different values there are for a specific test so that we can reduce diversity among Tor users? In that case, the site would not store duplicated fingerprints or it could be finer-grained and not store duplicated values for each test.
Or do we want to go further and know the actual distribution of values among Tor users so that it may guide the development of a potential defense? In this case, the site must identify returning users and it is a lot harder to do here. The only method that comes to mind and that would be accurate enough to work in this situation would be to put the test suite behind some kind of registration system. The problem is that a mandatory registration goes in the complete opposite direction of what Tor is about and it would greatly limit the number of participating users (or even render the site useless before it is even launched). A solution in the middle would be not to store duplicated fingerprints but I really don't know how much it would affect the statistics in the long run. Would it be marginal and affect like 2/3/4% of collected fingerprints or would it be a lot more and go above 20% or else?
Finally, I thought about using additional means of identification like canvas fingerprinting but I don't think there would be enough diversity here to identify a browser.
Please find other responses below.
Best, Gunes
On 2016-03-15 04:46, Pierre Laperdrix wrote:
Hi Tor Community, ....
- How closed/private/transparent should the website be about its tests
and the results? Should every tests be clearly indicated on the webpage with their own description? or should some tests stay hidden to prevent spreading usable tests to fingerprint Tor users?
I think the site should be transparent about the tests it runs. Perhaps the majority of the fingerprinting tests/code will run on the client side and can be easily captured by anyone with necessary skills (even if you obfuscate them).
You are right on that. It makes sense to be transparent since obfuscated JS code can be deciphered by someone with the necessary skills. Also, if tests are hidden, most Tor users would rightfully be wary on what is exactly being executed in their browser and they would simply not take the test. In that case, the impact of the website would be greatly limited. Being transparent really seems to be the right way to go here.
- Should a statistics page exist? Should we give a read access to the
database to every user (like in the form of a REST API or other solutions)?
I think aggregate statistics should be available publicly but exposing individual fingerprints publicly may not be necessary.
Like you said, aggregate statistics seem to be the best solution here. Then, I'm wondering if it would be possible to offer the complete list of values for each attribute separately from others. Then, my concern is how easy it would be to correlate separate attributes to recreate fingerprints, even partial ones.
Regards, Pierre
Hi Pierre,
On 2016-03-16 11:58, Pierre Laperdrix wrote:
Hi Gunes,
Thanks a lot for the feedback!
On 03/16/2016 03:30 PM, gunes acar wrote:
Hi Pierre,
Thanks for the very well thought proposal!
I'm curious about your ideas on the "returning device problem." EFF's Panopticlick and AmIUnique.org use a combination of cookies and IP address to recognize returning users - so that their fingerprints are not "double-counted."
Since these signals will not available anymore (unless the user opt-ins to retain the cookie), I wonder what'd be your ideas to address this issue.
This one is a really interesting question but a tricky one because we can't really rely on the cookies+IP combination with the Tor browser. My answer here is simple: it all depends on the goal we set for the website.
I think the original goals were to understand the fingerprint distribution and to measure the effect of introduced defenses (e.g. by measuring the uniqueness/entropy before vs. after the defense).
I agree with you that guaranteeing no double-counting may not be possible, especially if we consider a determined attacker. A more realistic goal could be to filter out double-submissions from benign users.
Let me point out an idea raised in the previous discussions: One option to enroll users for the tests was to have a link on the about:tor page similar to "Test Tor Network Settings" link. The fingerprint link could also include (e.g. as URL parameters) TB version, localization and OS type to establish ground truth for the tests.
I wonder, if the same link can be used to signal a fresh fingerprint submission to the server. This may require keeping a boolean state (!) on the client side which may mean "already submitted a fingerprint with the current TB version." This state can be kept in TorButton's storage, away from the reach of non-chrome scripts. The fingerprinting site could then use this parameter to distinguish between fresh and recurrent submissions.
An alternative can be to present a fresh submission link on the "changelog" tab, which is guaranteed to be shown once after each update - right when we want to collect a new test from users.
Perhaps we should be cautious about keeping any client-side state, and be clear about the limitations of these approaches. But I feel like the way we enroll the users can be used to prevent pollution, at least by the well-behaving Tor users. Just wanted point out this line of thought, no doubt you can come up with better alternatives.
Best, Gunes
Do we want to learn how many different values there are for a specific test so that we can reduce diversity among Tor users? In that case, the site would not store duplicated fingerprints or it could be finer-grained and not store duplicated values for each test.
Or do we want to go further and know the actual distribution of values among Tor users so that it may guide the development of a potential defense? In this case, the site must identify returning users and it is a lot harder to do here. The only method that comes to mind and that would be accurate enough to work in this situation would be to put the test suite behind some kind of registration system. The problem is that a mandatory registration goes in the complete opposite direction of what Tor is about and it would greatly limit the number of participating users (or even render the site useless before it is even launched). A solution in the middle would be not to store duplicated fingerprints but I really don't know how much it would affect the statistics in the long run. Would it be marginal and affect like 2/3/4% of collected fingerprints or would it be a lot more and go above 20% or else?
Finally, I thought about using additional means of identification like canvas fingerprinting but I don't think there would be enough diversity here to identify a browser.
Please find other responses below.
Best, Gunes
On 2016-03-15 04:46, Pierre Laperdrix wrote:
Hi Tor Community, ....
- How closed/private/transparent should the website be about its tests
and the results? Should every tests be clearly indicated on the webpage with their own description? or should some tests stay hidden to prevent spreading usable tests to fingerprint Tor users?
I think the site should be transparent about the tests it runs. Perhaps the majority of the fingerprinting tests/code will run on the client side and can be easily captured by anyone with necessary skills (even if you obfuscate them).
You are right on that. It makes sense to be transparent since obfuscated JS code can be deciphered by someone with the necessary skills. Also, if tests are hidden, most Tor users would rightfully be wary on what is exactly being executed in their browser and they would simply not take the test. In that case, the impact of the website would be greatly limited. Being transparent really seems to be the right way to go here.
- Should a statistics page exist? Should we give a read access to the
database to every user (like in the form of a REST API or other solutions)?
I think aggregate statistics should be available publicly but exposing individual fingerprints publicly may not be necessary.
Like you said, aggregate statistics seem to be the best solution here. Then, I'm wondering if it would be possible to offer the complete list of values for each attribute separately from others. Then, my concern is how easy it would be to correlate separate attributes to recreate fingerprints, even partial ones.
Regards, Pierre
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 03/17/2016 06:02 PM, gunes acar wrote:
Hi Pierre,
On 2016-03-16 11:58, Pierre Laperdrix wrote:
Hi Gunes,
Thanks a lot for the feedback!
On 03/16/2016 03:30 PM, gunes acar wrote:
Hi Pierre,
Thanks for the very well thought proposal!
I'm curious about your ideas on the "returning device problem." EFF's Panopticlick and AmIUnique.org use a combination of cookies and IP address to recognize returning users - so that their fingerprints are not "double-counted."
Since these signals will not available anymore (unless the user opt-ins to retain the cookie), I wonder what'd be your ideas to address this issue.
This one is a really interesting question but a tricky one because we can't really rely on the cookies+IP combination with the Tor browser. My answer here is simple: it all depends on the goal we set for the website.
I think the original goals were to understand the fingerprint distribution and to measure the effect of introduced defenses (e.g. by measuring the uniqueness/entropy before vs. after the defense).
I agree with you that guaranteeing no double-counting may not be possible, especially if we consider a determined attacker. A more realistic goal could be to filter out double-submissions from benign users.
Let me point out an idea raised in the previous discussions: One option to enroll users for the tests was to have a link on the about:tor page similar to "Test Tor Network Settings" link. The fingerprint link could also include (e.g. as URL parameters) TB version, localization and OS type to establish ground truth for the tests.
I wonder, if the same link can be used to signal a fresh fingerprint submission to the server. This may require keeping a boolean state (!) on the client side which may mean "already submitted a fingerprint with the current TB version." This state can be kept in TorButton's storage, away from the reach of non-chrome scripts. The fingerprinting site could then use this parameter to distinguish between fresh and recurrent submissions.
An alternative can be to present a fresh submission link on the "changelog" tab, which is guaranteed to be shown once after each update
- right when we want to collect a new test from users.
Perhaps we should be cautious about keeping any client-side state, and be clear about the limitations of these approaches. But I feel like the way we enroll the users can be used to prevent pollution, at least by the well-behaving Tor users. Just wanted point out this line of thought, no doubt you can come up with better alternatives.
Best, Gunes
I was so focused on basic browser mechanisms and what we could do with fingerprinting that I forgot that something can be done inside the browser. Even though storing a boolean won't totally fix the problem of someone polluting the database, if we want to analyze the distribution of fingerprints, this may be a good step forward for legitimate users who want to contribute. Then comes the question of the "recurrent" or "not fresh" submissions. If we only store the first "fresh" fingerprint, we may miss subsequent fingerprints that may be more interesting for us. So, one solution would be: - Storage of all fresh fingerprints. This would give an idea of the fingerprint distribution. - Storage of recurrent fingerprints while removing duplicates. This would give all the possible values for a specific attribute and the same device could contribute several times. I don't know if this seems a good approach or if this is too complicated. It is a hard balance to keep with privacy on one side and relevant data on the other. If we really have to identify returning users, we need some kind of ID somewhere but even that could be modified.
Also, to have a ground truth given directly by the browser seems to be really good. At first, I thought about detecting the browser version through fingerprinting but when I looked at some of the changelog, some updates may not be detectable through fingerprinting (example with minor version 5.5.2 of the Tor browser).
Pierre
Do we want to learn how many different values there are for a specific test so that we can reduce diversity among Tor users? In that case, the site would not store duplicated fingerprints or it could be finer-grained and not store duplicated values for each test.
Or do we want to go further and know the actual distribution of values among Tor users so that it may guide the development of a potential defense? In this case, the site must identify returning users and it is a lot harder to do here. The only method that comes to mind and that would be accurate enough to work in this situation would be to put the test suite behind some kind of registration system. The problem is that a mandatory registration goes in the complete opposite direction of what Tor is about and it would greatly limit the number of participating users (or even render the site useless before it is even launched). A solution in the middle would be not to store duplicated fingerprints but I really don't know how much it would affect the statistics in the long run. Would it be marginal and affect like 2/3/4% of collected fingerprints or would it be a lot more and go above 20% or else?
Finally, I thought about using additional means of identification like canvas fingerprinting but I don't think there would be enough diversity here to identify a browser.
Please find other responses below.
Best, Gunes
On 2016-03-15 04:46, Pierre Laperdrix wrote:
Hi Tor Community, ....
- How closed/private/transparent should the website be about its tests
and the results? Should every tests be clearly indicated on the webpage with their own description? or should some tests stay hidden to prevent spreading usable tests to fingerprint Tor users?
I think the site should be transparent about the tests it runs. Perhaps the majority of the fingerprinting tests/code will run on the client side and can be easily captured by anyone with necessary skills (even if you obfuscate them).
You are right on that. It makes sense to be transparent since obfuscated JS code can be deciphered by someone with the necessary skills. Also, if tests are hidden, most Tor users would rightfully be wary on what is exactly being executed in their browser and they would simply not take the test. In that case, the impact of the website would be greatly limited. Being transparent really seems to be the right way to go here.
- Should a statistics page exist? Should we give a read access to the
database to every user (like in the form of a REST API or other solutions)?
I think aggregate statistics should be available publicly but exposing individual fingerprints publicly may not be necessary.
Like you said, aggregate statistics seem to be the best solution here. Then, I'm wondering if it would be possible to offer the complete list of values for each attribute separately from others. Then, my concern is how easy it would be to correlate separate attributes to recreate fingerprints, even partial ones.
Regards, Pierre
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hi Pierre, on first glance looks like a nice proposal. Just a heads up though that to be considered there needs to be a prospective mentor for this project. Reaching out to tor-dev@ is a great first step and hopefully it'll do the trick, but if it doesn't try asking on the #tor-dev irc channel.
Cheers! -Damian
On Tue, Mar 15, 2016 at 1:46 AM, Pierre Laperdrix pierre.laperdrix@irisa.fr wrote:
Hi Tor Community,
My name is Pierre and I'm really interested in participating in a GSoC project this year with the Tor organization. Since I've been working on browser fingerprinting for the past two years, I'd love to build a Panopticlick-like website to improve the fingerprinting defenses of the Tor browser.
I've included below my proposal in case anyone has ideas or suggestions, especially on the technical section or on some of the open questions that I have. (It should be noted that the Torprinter name is subject to change).
Summary - The Torprinter project: a browser fingerprinting website to improve Tor fingerprinting defenses The capabilities of browser fingerprinting as a tool to track users online has been demonstrated by Panopticlick and other research papers since 2010. The Tor community is fully aware of the problem and the Tor browser has been modified to follow the "one fingerprint for all" approach. Spoofing HTTP headers, removing plugins, including bundled fonts, preventing canvas image extraction: these are a few examples of the progress made by Tor developers to protect their users against such threat. However, due to the constant evolution of the web and its underlying technologies, it has become a true challenge to always stay ahead of the latest fingerprinting techniques. I'm deeply interested in privacy and I've been studying browser fingerprinting for the past 2 years. I've launched 18 months ago the AmIUnique.org website to investigate the latest fingerprinting techniques. Collecting data on thousands of devices is one of the keys to understand and counter the fingerprinting problem. For this Google Summer of Code project, I propose to develop the Torprinter website that will run a fingerprinting test suite and collect data from Tor browsers to help developers design and test new defenses against browser fingerprinting. The website will be similar to AmIUnique or Panopticlick for users where they will get a complete summary with statistics after the test suite has been executed. It can be used to test new fingerprinting protection as well as making sure that fingerprinting-related bugs were correctly fixed with specific regression tests. The expected long-term impact of this project is to reduce the differences between Tor users and reinforce their privacy and anonymity online. In a second step, the website could open its doors to more browsers so that it could become a platform where vendors can implement significant changes in their browsers with regards to privacy and see the impact first-hand on the website. With the strong expertise I have acquired on the fingerprinting subject and the experience I have gained by developing the AmIUnique website, I believe I'm fully qualified to see such a project through to completion.
Website features The main feature of the website is to collect a set of fingerprintable attributes on the client and calculate the distribution of values for each attribute like Panopticlick or AmIUnique. The set of tests would not only include known fingerprinting techniques but also ones developed specifically for the Tor browser. The second main feature of the website would be for Tor users to check how close their current fingerprint is from the ideal unique fingerprint that most users should share. A list of actions should be added to help users configure their browser to reach this ideal fingerprint. The third main feature would be an API for automated tests as detailed by this page : https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm... . This would enable automatic verification of Tor protection features with regard to fingerprinting. When a new version is released, the output of specific tests will be verified to check for any evolution/changes/regressions from previous versions. The fourth main feature I'd like to include is a complete stats page where the user can go through every attribute and filter by OS, browser version and more. The inclusion of additional features that go beyond the core functionnalities of the site should be driven by the needs of the developers and the Tor community. Still, a lot of open questions remain that should be addressed during the bonding period to define precisely how each of these features should ultimately work. Some of these open questions include:
- How closed/private/transparent should the website be about its tests
and the results? Should every tests be clearly indicated on the webpage with their own description? or should some tests stay hidden to prevent spreading usable tests to fingerprint Tor users?
- Should a statistics page exist? Should we give a read access to the
database to every user (like in the form of a REST API or other solutions)?
- Where the data should be stored? How long should the data be kept? If
tests are performed by versions, should the data from an old TBB version be removed? Should the data be kept a week, a month or more?
- How new tests should be added: A pull request? A form where
submissions are reviewed by admins? A link to the Tor tracker?
- Should the website only be accessible through Tor?
Technical choices In my opinion, the website must be accessible and modular. It should have the ability to cope with an important number of connections/data. With this in mind and the experience gained from developing AmIUnique, I plan on using the Play framework with a MongoDB database. Developing the website in Java opens the door to many developers to make the website better and more robust after its initial launch since it is one of most used programming language in the world. On the storage and statistics side, MongoDB is a good fit because it is now a mature technology that can scale well with an important number of data and connections. Moreover, the use of SQL databases for AmIUnique proved to be really powerful but the maintenance after the website was launched became a tedious task, especially when modifying the underlying model of a fingerprint to collect new attributes. The choice of a more flexible and modular database seems a better choice for maintenance and for adding/removing tests.
Estimated timeline You will find below a rough estimate of the timeline for the three months of the GSoC.
Community bonding period - Discuss with the mentors and the community the set of features that should be included in the very first version of the website and clarify the open questions raised in one of the previous paragraphs.
23 May - 27 June : Development of the first version of the website with the core features Week 1 - Development of the first version of the fingerprinting script with the core set of attributes. Special attention will be given so that it is fully compatible with the most recent version of the Tor browser (and older ones too). Week 2 - Start developing the front-end and the back-end to store fingerprints with a page containing data on your current fingerprint (try adding a view to see how close/far you are from the ideal fingerprint). Week 3 - Start developing the statistics page with the necessary visualization for the users. Modification of the back-end to improve statistics computation to lessen the server load. Week 4 - Finishing the front-end development and refining the statistics page to get back the most relevant information. Adding and testing an API to support automated tests. Week 5 - Finishing the first version so that it is ready for deployment. Start developing additional features requested by the community (rest API? account management?)
27 June - Mid July : Deployment of the first version online for a beta-test with bug fixing. Finishing development of additional features requested by the mentors/community. Defining the list of new features for the second version.
Mid July - 23th August : Adding a system to make the website as flexible as possible to add/remove tests easily (A pull-request system? A test submission form where admins review tests before they are included in the test suite?) Developing additional features for the website. Making sure that the website can be opened to more browsers (work done at design time to support any browsers will be tested here) Bug fixing
Code sample In 2014, I developed the entire AmIUnique.org website from scratch. Its aim is to collect fingerprints to study the current diversity of fingerprints on the Internet while providing full details to users on this subject. It was the first time that I built a complete website from the design phase to its deployment online. One of the first challenge that I encountered was to build a script that would not only use state-of-the-art techniques but that could simply work on the widest variety of browsers. Testing a script for a recent version of a major browser like Chrome and Firefox is an easy task since they implement the latest HTML and JavaScript technologies but making sure that the script runs correctly on older browsers like Internet Explorer is another story. Juggling with a dozen different virtual machines was necessary to obtain a bug-free and stable version of the script. A small beta-test was required to make sure that everything was good to go for what is now the foundations of the AmIUnique website. The totality of the source code for AmIUnique and my other projects can be found on GitHub. A second challenge that I faced was to deal with the increasing load of users so that the server could return personalized statistics to visitors in a timely manner (less than 2/3s). By having a separate entity that updates statistics in real time on top of the database, I managed to drastically reduce the server load. With the number of Tor users around the world, the website needs from the get go to handle a high load of visitors and statistics computation and my previous experience on that specific task will prove useful.
For the very first version of Torprinter, I plan on testing well-known and widespread fingerprinting techniques to make sure that there is no variation among Tor users. These include HTTP headers and known JavaScript objects. There should be no need for any Flash attributes since plugins are not present in the Tor browser (thus removing complex code in charge of correctly loading the Flash object). For this proposal, I have also developed a special page with 7 different tests that are mainly targeted at the Tor browser to give an idea of what tests can be included that are more suited to the Tor users. Tests n°5, n°6 and n°7 are broader and also concerns the Firefox browser. You can found a working version of the script on a special webpage (need to scroll to make the results appear): https://plaperdr.github.io/torScript.html The script can be found here: https://plaperdr.github.io/assets/tor/tor.js
Test n°1 Test the size of the current window - As reported by ticket n°14098 https://trac.torproject.org/projects/tor/ticket/14098 Test n°2 Test the support of emoji - As reported by ticket n°18172 https://trac.torproject.org/projects/tor/ticket/18172 Test n°3 Analysis of the "scroll" behavior of the window - As investiagted by http://jcarlosnorte.com/security/2016/03/06/advanced-tor-browser-fingerprint... Test n°4 Test the size of current fallback font by using the canvas API to render some text (no need for user permission like canvas extraction) - Custom test Test n°5 Test the difference between OS on the maximum font size - Custom test Test n°6 Test the difference between OS on the Date API - As reported by ticket n°15473 https://trac.torproject.org/projects/tor/ticket/15473 Test n°7 Test the difference between OS on the Math class - As reported by ticket n° 13018 https://trac.torproject.org/projects/tor/ticket/13018
Any remarks, suggestions or ideas are very welcome! Pierre
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hi Pierre, on first glance looks like a nice proposal. Just a heads up though that to be considered there needs to be a prospective mentor for this project. Reaching out to tor-dev@ is a great first step and hopefully it'll do the trick, but if it doesn't try asking on the #tor-dev irc channel.
Oops, my bad. Missed that this was Panopticlick - gave GeKo a nudge to take a peek. :)
Hi Pierre,
thanks for this proposal. Gunes has already raised some good points and I won't repeat them here. This is part one of my feedback as I need a bit more time to think about the code example section.
Pierre Laperdrix:
Hi Tor Community,
My name is Pierre and I'm really interested in participating in a GSoC project this year with the Tor organization. Since I've been working on browser fingerprinting for the past two years, I'd love to build a Panopticlick-like website to improve the fingerprinting defenses of the Tor browser.
I've included below my proposal in case anyone has ideas or suggestions, especially on the technical section or on some of the open questions that I have. (It should be noted that the Torprinter name is subject to change).
Summary - The Torprinter project: a browser fingerprinting website to improve Tor fingerprinting defenses The capabilities of browser fingerprinting as a tool to track users online has been demonstrated by Panopticlick and other research papers since 2010. The Tor community is fully aware of the problem and the Tor browser has been modified to follow the "one fingerprint for all" approach. Spoofing HTTP headers, removing plugins, including bundled fonts, preventing canvas image extraction: these are a few examples of the progress made by Tor developers to protect their users against such threat. However, due to the constant evolution of the web and its underlying technologies, it has become a true challenge to always stay ahead of the latest fingerprinting techniques. I'm deeply interested in privacy and I've been studying browser fingerprinting for the past 2 years. I've launched 18 months ago the AmIUnique.org website to investigate the latest fingerprinting techniques. Collecting data on thousands of devices is one of the keys to understand and counter the fingerprinting problem. For this Google Summer of Code project, I propose to develop the Torprinter website that will run a fingerprinting test suite and collect data from Tor browsers to help developers design and test new defenses against browser fingerprinting. The website will be similar to AmIUnique or Panopticlick for users where they will get a complete summary with statistics after the test suite has been executed. It can be used to test new fingerprinting protection as well as making sure that fingerprinting-related bugs were correctly fixed with specific regression tests. The expected long-term impact of this project is to reduce the differences between Tor users and reinforce their privacy and anonymity online. In a second step, the website could open its doors to more browsers so that it could become a platform where vendors can implement significant changes in their browsers with regards to privacy and see the impact first-hand on the website. With the strong expertise I have acquired on the fingerprinting subject and the experience I have gained by developing the AmIUnique website, I believe I'm fully qualified to see such a project through to completion.
Website features The main feature of the website is to collect a set of fingerprintable attributes on the client and calculate the distribution of values for each attribute like Panopticlick or AmIUnique. The set of tests would not only include known fingerprinting techniques but also ones developed specifically for the Tor browser. The second main feature of the website would be for Tor users to check how close their current fingerprint is from the ideal unique fingerprint that most users should share. A list of actions should be added to help users configure their browser to reach this ideal fingerprint.
We might want to think about that ideal fingerprint idea a bit. I think there is no such a thing even for Tor Browser users as we are e.g. rounding the content window size to a multiple of 200x100 for each user. Thus, we have at least one fingerprintable attribute where we say "you are good if you have one out of a bunch of possible values". The same holds for our security slider which basically partitions the Tor Browser users. We could revisit these design decisions and I am especially interested in getting data that is backing/not backing our decisions regarding them. Nevertheless, I assume we won't always be able to put users into just one bucket per attribute due to usability issues. And this in turn makes the idea to help users configure their browser not easier.
The third main feature would be an API for automated tests as detailed by this page : https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm... . This would enable automatic verification of Tor protection features with regard to fingerprinting. When a new version is released, the output of specific tests will be verified to check for any evolution/changes/regressions from previous versions. The fourth main feature I'd like to include is a complete stats page where the user can go through every attribute and filter by OS, browser version and more. The inclusion of additional features that go beyond the core functionnalities of the site should be driven by the needs of the developers and the Tor community. Still, a lot of open questions remain that should be addressed during the bonding period to define precisely how each of these features should ultimately work. Some of these open questions include:
- How closed/private/transparent should the website be about its tests
and the results? Should every tests be clearly indicated on the webpage with their own description? or should some tests stay hidden to prevent spreading usable tests to fingerprint Tor users?
- Should a statistics page exist? Should we give a read access to the
database to every user (like in the form of a REST API or other solutions)?
- Where the data should be stored? How long should the data be kept? If
tests are performed by versions, should the data from an old TBB version be removed? Should the data be kept a week, a month or more?
I am not sure about how long the data should be kept. It probably depends on what kind of data we are talking about (e.g. aggregate or not). I think, though, that data we collected with Tor Browser A should not get deleted just because Tor Browser A+1 got released. I think, in fact, we might want to keep that data especially if we want to give users a guide about how to get a "better" fingerprint. But even if not we might want to have this data to measure e.g. whether a fix for a particular fingerprinting vector had an impact and if so, which one.
- How new tests should be added: A pull request? A form where
submissions are reviewed by admins? A link to the Tor tracker?
From a Tor perspective opening a ticket and posting the test there or ideally having a link to a test in the ticket that is fixing the fingerprinting vector seems like the preferred solution. I'd like to avoid the situation where tests get added to the system and we don't know about that dealing with users that are scared because of the new results. So, yes, some review should be involved here.
- Should the website only be accessible through Tor?
I don't think so. I am fine with Chrome/IE etc. users that try to see how they fare on that test. This not-closing-down right from the start and proper communication about that might be important if we like to create a better test platform not only for Tor Browser but other vendors as you alluded to above. (which is a good idea as it encourages collaboration and a better understanding of the fingerprinting problematic in general)
Technical choices In my opinion, the website must be accessible and modular. It should have the ability to cope with an important number of connections/data. With this in mind and the experience gained from developing AmIUnique, I plan on using the Play framework with a MongoDB database. Developing the website in Java opens the door to many developers to make the website better and more robust after its initial launch since it is one of most used programming language in the world. On the storage and statistics side, MongoDB is a good fit because it is now a mature technology that can scale well with an important number of data and connections. Moreover, the use of SQL databases for AmIUnique proved to be really powerful but the maintenance after the website was launched became a tedious task, especially when modifying the underlying model of a fingerprint to collect new attributes. The choice of a more flexible and modular database seems a better choice for maintenance and for adding/removing tests.
If we look at the Tor side I guess we have more experience with Python code (which includes me) than Java. Thus, by using Python it might be easier for us to maintain the code in the longer run. That said, I am fine with the decisions as you made them especially if you are already familiar with using all these tools/languages. And, hey, we always encourage students to stay connected to us and get even deeper involved after the GSoC ended. So, this might then actually be an area for you... ;)
One thing I'd like you to think about, though, is that we have guidelines for developing services that might be running on Tor project infrastructure one day:
https://trac.torproject.org/projects/tor/wiki/org/operations/Guidelines
Not sure if the tools you had in mind above fit the requirements outlined there. If not, we should try to fix that. (Thanks to Karsten for pointing that out)
Estimated timeline You will find below a rough estimate of the timeline for the three months of the GSoC.
Community bonding period - Discuss with the mentors and the community the set of features that should be included in the very first version of the website and clarify the open questions raised in one of the previous paragraphs.
23 May - 27 June : Development of the first version of the website with the core features Week 1 - Development of the first version of the fingerprinting script with the core set of attributes. Special attention will be given so that it is fully compatible with the most recent version of the Tor browser (and older ones too). Week 2 - Start developing the front-end and the back-end to store fingerprints with a page containing data on your current fingerprint (try adding a view to see how close/far you are from the ideal fingerprint). Week 3 - Start developing the statistics page with the necessary visualization for the users. Modification of the back-end to improve statistics computation to lessen the server load. Week 4 - Finishing the front-end development and refining the statistics page to get back the most relevant information. Adding and testing an API to support automated tests. Week 5 - Finishing the first version so that it is ready for deployment. Start developing additional features requested by the community (rest API? account management?)
27 June - Mid July : Deployment of the first version online for a beta-test with bug fixing. Finishing development of additional features requested by the mentors/community. Defining the list of new features for the second version.
Mid July - 23th August : Adding a system to make the website as flexible as possible to add/remove tests easily (A pull-request system? A test submission form where admins review tests before they are included in the test suite?) Developing additional features for the website. Making sure that the website can be opened to more browsers (work done at design time to support any browsers will be tested here) Bug fixing
That looks like a good timeline estimation to me.
That's it for the first feedback,
Georg
[snip]
Hi Georg,
Thanks for the feedback!
On 03/17/2016 10:06 PM, Georg Koppen wrote:
Hi Pierre,
thanks for this proposal. Gunes has already raised some good points and I won't repeat them here. This is part one of my feedback as I need a bit more time to think about the code example section.
Pierre Laperdrix:
Hi Tor Community, .....
Website features The main feature of the website is to collect a set of fingerprintable attributes on the client and calculate the distribution of values for each attribute like Panopticlick or AmIUnique. The set of tests would not only include known fingerprinting techniques but also ones developed specifically for the Tor browser. The second main feature of the website would be for Tor users to check how close their current fingerprint is from the ideal unique fingerprint that most users should share. A list of actions should be added to help users configure their browser to reach this ideal fingerprint.
We might want to think about that ideal fingerprint idea a bit. I think there is no such a thing even for Tor Browser users as we are e.g. rounding the content window size to a multiple of 200x100 for each user. Thus, we have at least one fingerprintable attribute where we say "you are good if you have one out of a bunch of possible values". The same holds for our security slider which basically partitions the Tor Browser users. We could revisit these design decisions and I am especially interested in getting data that is backing/not backing our decisions regarding them. Nevertheless, I assume we won't always be able to put users into just one bucket per attribute due to usability issues. And this in turn makes the idea to help users configure their browser not easier.
You are right on that. I'll switch to the idea of an "ideal" fingerprint to an "acceptable" one in my proposal. The idea to partition the dataset into categories from the security slider can be really interesting and we could try to play with some JS benchmarking since some JS optimizations are removed on higher security levels. Then, we could try to detect the security level either through fingerprinting or following the suggestion of Gunes, the browser could directly add the level as a URL parameter for the ground truth.
The third main feature would be an API for automated tests as detailed by this page : https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm... . This would enable automatic verification of Tor protection features with regard to fingerprinting. When a new version is released, the output of specific tests will be verified to check for any evolution/changes/regressions from previous versions. The fourth main feature I'd like to include is a complete stats page where the user can go through every attribute and filter by OS, browser version and more. The inclusion of additional features that go beyond the core functionnalities of the site should be driven by the needs of the developers and the Tor community. Still, a lot of open questions remain that should be addressed during the bonding period to define precisely how each of these features should ultimately work. Some of these open questions include:
- How closed/private/transparent should the website be about its tests
and the results? Should every tests be clearly indicated on the webpage with their own description? or should some tests stay hidden to prevent spreading usable tests to fingerprint Tor users?
- Should a statistics page exist? Should we give a read access to the
database to every user (like in the form of a REST API or other solutions)?
- Where the data should be stored? How long should the data be kept? If
tests are performed by versions, should the data from an old TBB version be removed? Should the data be kept a week, a month or more?
I am not sure about how long the data should be kept. It probably depends on what kind of data we are talking about (e.g. aggregate or not). I think, though, that data we collected with Tor Browser A should not get deleted just because Tor Browser A+1 got released. I think, in fact, we might want to keep that data especially if we want to give users a guide about how to get a "better" fingerprint. But even if not we might want to have this data to measure e.g. whether a fix for a particular fingerprinting vector had an impact and if so, which one.
It makes sense to keep data from previous versions. Moreover, I don't know how fast the majority of Tor users upgrade their browsers but when a new version is launched, it would be a bad idea to restrict the collection of fingerprints to one or two specific versions.
- How new tests should be added: A pull request? A form where
submissions are reviewed by admins? A link to the Tor tracker?
From a Tor perspective opening a ticket and posting the test there or ideally having a link to a test in the ticket that is fixing the fingerprinting vector seems like the preferred solution. I'd like to avoid the situation where tests get added to the system and we don't know about that dealing with users that are scared because of the new results. So, yes, some review should be involved here.
I always envisioned some form of review to add new tests. I still don't know exactly what the system would be since I don't know the exact structure of the website yet.
- Should the website only be accessible through Tor?
I don't think so. I am fine with Chrome/IE etc. users that try to see how they fare on that test. This not-closing-down right from the start and proper communication about that might be important if we like to create a better test platform not only for Tor Browser but other vendors as you alluded to above. (which is a good idea as it encourages collaboration and a better understanding of the fingerprinting problematic in general)
Yes, for this reason, there is no reason to restrict the access.
Technical choices In my opinion, the website must be accessible and modular. It should have the ability to cope with an important number of connections/data. With this in mind and the experience gained from developing AmIUnique, I plan on using the Play framework with a MongoDB database. Developing the website in Java opens the door to many developers to make the website better and more robust after its initial launch since it is one of most used programming language in the world. On the storage and statistics side, MongoDB is a good fit because it is now a mature technology that can scale well with an important number of data and connections. Moreover, the use of SQL databases for AmIUnique proved to be really powerful but the maintenance after the website was launched became a tedious task, especially when modifying the underlying model of a fingerprint to collect new attributes. The choice of a more flexible and modular database seems a better choice for maintenance and for adding/removing tests.
If we look at the Tor side I guess we have more experience with Python code (which includes me) than Java. Thus, by using Python it might be easier for us to maintain the code in the longer run. That said, I am fine with the decisions as you made them especially if you are already familiar with using all these tools/languages. And, hey, we always encourage students to stay connected to us and get even deeper involved after the GSoC ended. So, this might then actually be an area for you... ;)
I wrote that I would use Java and Play because I'm familiar with it but I'm really open to try something new. For the past year, I have mainly used Java and Python so switching to Python is absolutely not a problem for me. In terms of timeline, this would mean that the website would take a little more time to have a proper first version running but if it means in the long-term that a more broader part of the Tor community can participate, it is better. And for me, learning new technos is part of the fun of development. In terms of framework, the new version of Panopticlick is using Flask but the Django framework seems to be more complete with a stronger community support.
One thing I'd like you to think about, though, is that we have guidelines for developing services that might be running on Tor project infrastructure one day:
https://trac.torproject.org/projects/tor/wiki/org/operations/Guidelines
Not sure if the tools you had in mind above fit the requirements outlined there. If not, we should try to fix that. (Thanks to Karsten for pointing that out)
Thanks for pointing that out! I wasn't aware of it. If I read the Guidelines correctly, to make a service run on the Tor infrastructure, you must use trusted and stable sources. In the case of Tor, it is stable Debian packages. The use of self-provided libraries or third-party package managers is not recommended. If I decide to go the Django way, it is in the stable repository of Debian. However, if I want to use a non-relational database like Mongo, it uses pip and git repositories for the connectors so we end up in the "Not recommended" area (even if it is not forbidden). Django has built-in support for SQL DBs but in my opinion, it would not be easy to add/remove tests with such rigid models. I don't know who says if it is okay or not to be on the Tor infrastructure but it seems to me that it could work.
Pierre
Estimated timeline You will find below a rough estimate of the timeline for the three months of the GSoC.
Community bonding period - Discuss with the mentors and the community the set of features that should be included in the very first version of the website and clarify the open questions raised in one of the previous paragraphs.
23 May - 27 June : Development of the first version of the website with the core features Week 1 - Development of the first version of the fingerprinting script with the core set of attributes. Special attention will be given so that it is fully compatible with the most recent version of the Tor browser (and older ones too). Week 2 - Start developing the front-end and the back-end to store fingerprints with a page containing data on your current fingerprint (try adding a view to see how close/far you are from the ideal fingerprint). Week 3 - Start developing the statistics page with the necessary visualization for the users. Modification of the back-end to improve statistics computation to lessen the server load. Week 4 - Finishing the front-end development and refining the statistics page to get back the most relevant information. Adding and testing an API to support automated tests. Week 5 - Finishing the first version so that it is ready for deployment. Start developing additional features requested by the community (rest API? account management?)
27 June - Mid July : Deployment of the first version online for a beta-test with bug fixing. Finishing development of additional features requested by the mentors/community. Defining the list of new features for the second version.
Mid July - 23th August : Adding a system to make the website as flexible as possible to add/remove tests easily (A pull-request system? A test submission form where admins review tests before they are included in the test suite?) Developing additional features for the website. Making sure that the website can be opened to more browsers (work done at design time to support any browsers will be tested here) Bug fixing
That looks like a good timeline estimation to me.
That's it for the first feedback,
Georg
[snip]
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Pierre Laperdrix:
[snip]
Technical choices In my opinion, the website must be accessible and modular. It should have the ability to cope with an important number of connections/data. With this in mind and the experience gained from developing AmIUnique, I plan on using the Play framework with a MongoDB database. Developing the website in Java opens the door to many developers to make the website better and more robust after its initial launch since it is one of most used programming language in the world. On the storage and statistics side, MongoDB is a good fit because it is now a mature technology that can scale well with an important number of data and connections. Moreover, the use of SQL databases for AmIUnique proved to be really powerful but the maintenance after the website was launched became a tedious task, especially when modifying the underlying model of a fingerprint to collect new attributes. The choice of a more flexible and modular database seems a better choice for maintenance and for adding/removing tests.
If we look at the Tor side I guess we have more experience with Python code (which includes me) than Java. Thus, by using Python it might be easier for us to maintain the code in the longer run. That said, I am fine with the decisions as you made them especially if you are already familiar with using all these tools/languages. And, hey, we always encourage students to stay connected to us and get even deeper involved after the GSoC ended. So, this might then actually be an area for you... ;)
I wrote that I would use Java and Play because I'm familiar with it but I'm really open to try something new. For the past year, I have mainly used Java and Python so switching to Python is absolutely not a problem for me. In terms of timeline, this would mean that the website would take a little more time to have a proper first version running but if it means in the long-term that a more broader part of the Tor community can participate, it is better. And for me, learning new technos is part of the fun of development. In terms of framework, the new version of Panopticlick is using Flask but the Django framework seems to be more complete with a stronger community support.
Nice. But as I said it is mainly up to you and I brought these things up to take these constraints (the maintainablility and the guidelines for running services on Tor Project infrastructure) into account for fine-tuning your proposal.
Georg
[snip]
Hi Pierre!
Thanks for this valuable proposal. :) Just a quick comment frome someone who has experienced supporting Tor users.
Georg Koppen:
- How new tests should be added: A pull request? A form where
submissions are reviewed by admins? A link to the Tor tracker?
From a Tor perspective opening a ticket and posting the test there or ideally having a link to a test in the ticket that is fixing the fingerprinting vector seems like the preferred solution. I'd like to avoid the situation where tests get added to the system and we don't know about that dealing with users that are scared because of the new results. So, yes, some review should be involved here.
It would be great if you could also include ways to guide users in understanding the test results. From the top of my mind, it would be good if the application would have a way to know which Tor Browser version is being run. Then together with results, it would be good if users would get an answer to the following questions:
* Is this the expected result? * If not, is there any remediation available? At which cost? - This could be prompting users to upgrade to a new version. Ideally include support for known tools which bundle the Tor Browser, so the message could be “Upgrade to Tails 2.2” for Tails users. - Tell them to fiddle with the security slider, with a warning that they will loose some features. * If there's no immediate remediation available, can they do anything? - Is the issue known at all? Can we then assist them to report the problem in a meaningful manner? Or point them at the existing ticket—with a warning that it's going to be tech+english. - Should they take extra precaution? Link to some documentation. - Do we need to collect more data? Let's guide them how. - Maybe it's a good opportunity to ask them for some money so we can hire more browser developers?
I'm pretty sure the UX team could give input on good wordings and layout. And probably on the whole thing. :)
Have you consider any internationalization?
If not all of this can be implemented over the summer, just keeping it in mind in the design stage might help to add the required features later.
Hi Lunar,
Thanks for the valuable feedback! It would be great to have the features that you cite into the website. For the first version, my point of view is to mainly focus on the added value for developers which is to add/remove tests easily and get the relevant data as easily as possible for them. With that, they can make decisions on what to do next inside the Tor browser. For subsequent versions, focusing on users could be really interesting if the rest proves to be solid and stable.
What you described in your answer would be a more advanced version of the "How far are you from an acceptable fingerprint?" feature that I plan to integrate from the get-go. If the website detects that the user has not a recommended configuration, it could give different links with information and steps on how to fix that so that a non-tech person can understand what they will do and what they will modify. Suggestions to play with the security slider could be a great addition. I'll be honest, I don't know how much of what you propose could be done before summer's end. My timeline is a little bit rough but even if it is not done by then, I can still add it after the GSoC period ends.
For the internationalization, the framework that I plan to use (either Play or Django) supports it through templating. This means that anyone can contribute to the translation without writing a line of HTML. The main file will be in English and I'll probably do the one in French at the same time. One contributor who wants to help will just have to take the English file and translate each line without having to find scattered hardcoded strings through different HTML files.
Pierre
On 03/22/2016 11:21 AM, Lunar wrote:
Hi Pierre!
Thanks for this valuable proposal. :) Just a quick comment frome someone who has experienced supporting Tor users.
Georg Koppen:
- How new tests should be added: A pull request? A form where
submissions are reviewed by admins? A link to the Tor tracker?
From a Tor perspective opening a ticket and posting the test there or ideally having a link to a test in the ticket that is fixing the fingerprinting vector seems like the preferred solution. I'd like to avoid the situation where tests get added to the system and we don't know about that dealing with users that are scared because of the new results. So, yes, some review should be involved here.
It would be great if you could also include ways to guide users in understanding the test results. From the top of my mind, it would be good if the application would have a way to know which Tor Browser version is being run. Then together with results, it would be good if users would get an answer to the following questions:
- Is this the expected result?
- If not, is there any remediation available? At which cost?
- This could be prompting users to upgrade to a new version. Ideally include support for known tools which bundle the Tor Browser, so the message could be “Upgrade to Tails 2.2” for Tails users.
- Tell them to fiddle with the security slider, with a warning that they will loose some features.
- If there's no immediate remediation available, can they do anything?
- Is the issue known at all? Can we then assist them to report the problem in a meaningful manner? Or point them at the existing ticket—with a warning that it's going to be tech+english.
- Should they take extra precaution? Link to some documentation.
- Do we need to collect more data? Let's guide them how.
- Maybe it's a good opportunity to ask them for some money so we can hire more browser developers?
I'm pretty sure the UX team could give input on good wordings and layout. And probably on the whole thing. :)
Have you consider any internationalization?
If not all of this can be implemented over the summer, just keeping it in mind in the design stage might help to add the required features later.
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Pierre Laperdrix:
Thanks for the valuable feedback! It would be great to have the features that you cite into the website. For the first version, my point of view is to mainly focus on the added value for developers which is to add/remove tests easily and get the relevant data as easily as possible for them. With that, they can make decisions on what to do next inside the Tor browser. For subsequent versions, focusing on users could be really interesting if the rest proves to be solid and stable.
Makes sense. My main worry is that some users will be using it, even if it's aimed at developers, and that quickly we will find ourselves having to repeat over and over “we know about this issue, the fix is more complicated than it seems, we're on it, but if you really worry just switch the security slider to high”.
As long as you have that in mind, and there's a basic way to display messages for users—this could just be a link to a wiki page—I think it'll be ok to add fancy stuff later.
For the internationalization, the framework that I plan to use (either Play or Django) supports it through templating. This means that anyone can contribute to the translation without writing a line of HTML. The main file will be in English and I'll probably do the one in French at the same time. One contributor who wants to help will just have to take the English file and translate each line without having to find scattered hardcoded strings through different HTML files.
Great! :) As long as it's properly i18nized, localizations can come later. Although doing a first localization while i18ning might help you spot missing strings.
Hi,
here comes feedback to the remaining part of the proposal.
Pierre Laperdrix:
[snip]
Code sample In 2014, I developed the entire AmIUnique.org website from scratch. Its aim is to collect fingerprints to study the current diversity of fingerprints on the Internet while providing full details to users on this subject. It was the first time that I built a complete website from the design phase to its deployment online. One of the first challenge that I encountered was to build a script that would not only use state-of-the-art techniques but that could simply work on the widest variety of browsers. Testing a script for a recent version of a major browser like Chrome and Firefox is an easy task since they implement the latest HTML and JavaScript technologies but making sure that the script runs correctly on older browsers like Internet Explorer is another story. Juggling with a dozen different virtual machines was necessary to obtain a bug-free and stable version of the script. A small beta-test was required to make sure that everything was good to go for what is now the foundations of the AmIUnique website. The totality of the source code for AmIUnique and my other projects can be found on GitHub. A second challenge that I faced was to deal with the increasing load of users so that the server could return personalized statistics to visitors in a timely manner (less than 2/3s). By having a separate entity that updates statistics in real time on top of the database, I managed to drastically reduce the server load. With the number of Tor users around the world, the website needs from the get go to handle a high load of visitors and statistics computation and my previous experience on that specific task will prove useful.
For the very first version of Torprinter, I plan on testing well-known and widespread fingerprinting techniques to make sure that there is no variation among Tor users. These include HTTP headers and known JavaScript objects. There should be no need for any Flash attributes since plugins are not present in the Tor browser (thus removing complex code in charge of correctly loading the Flash object).
We might think about that a bit. We have a bunch of users it seems that are still trying to go through all the hassle and are getting Flash going in Tor Browser. It might be enough to detect them by just enumerating available plugins.
For this proposal, I have also developed a special page with 7 different tests that are mainly targeted at the Tor browser to give an idea of what tests can be included that are more suited to the Tor users. Tests n°5, n°6 and n°7 are broader and also concerns the Firefox browser. You can found a working version of the script on a special webpage (need to scroll to make the results appear): https://plaperdr.github.io/torScript.html The script can be found here: https://plaperdr.github.io/assets/tor/tor.js
Test n°1 Test the size of the current window - As reported by ticket n°14098 https://trac.torproject.org/projects/tor/ticket/14098
FWIW: that test is not working correctly. We cap the width and the height at 1000px and round the window to a multiple of 200pxx100px.
Test n°2 Test the support of emoji - As reported by ticket n°18172 https://trac.torproject.org/projects/tor/ticket/18172 Test n°3 Analysis of the "scroll" behavior of the window - As investiagted by http://jcarlosnorte.com/security/2016/03/06/advanced-tor-browser-fingerprint... Test n°4 Test the size of current fallback font by using the canvas API to render some text (no need for user permission like canvas extraction) - Custom test Test n°5 Test the difference between OS on the maximum font size - Custom test Test n°6 Test the difference between OS on the Date API - As reported by ticket n°15473 https://trac.torproject.org/projects/tor/ticket/15473 Test n°7 Test the difference between OS on the Math class - As reported by ticket n° 13018 https://trac.torproject.org/projects/tor/ticket/13018
Any remarks, suggestions or ideas are very welcome!
We are currently not doing much against OS fingerprinting. So we'll see how useful tests like test 5,6,7 are. Maybe they show us that we should prioritize those things. But on the other hand we still suspect that there are other things out there providing more entropy.
A general thing to think about while writing the tests would be having them modular as well: a library could contain the functionality used by more than one test (like providing a canvas) while particalar tests would make use of it for specific purposes. This would prevent code duplication and should help making the project more maintainable.
Georg
Hi,
Thanks for the rest of the feedback and for taking the time to read everything! I'll update my proposal accordingly. I have added some small comments below.
Pierre
On 03/21/2016 04:23 PM, Georg Koppen wrote:
Hi,
here comes feedback to the remaining part of the proposal.
Pierre Laperdrix:
[snip]
Code sample In 2014, I developed the entire AmIUnique.org website from scratch. Its aim is to collect fingerprints to study the current diversity of fingerprints on the Internet while providing full details to users on this subject. It was the first time that I built a complete website from the design phase to its deployment online. One of the first challenge that I encountered was to build a script that would not only use state-of-the-art techniques but that could simply work on the widest variety of browsers. Testing a script for a recent version of a major browser like Chrome and Firefox is an easy task since they implement the latest HTML and JavaScript technologies but making sure that the script runs correctly on older browsers like Internet Explorer is another story. Juggling with a dozen different virtual machines was necessary to obtain a bug-free and stable version of the script. A small beta-test was required to make sure that everything was good to go for what is now the foundations of the AmIUnique website. The totality of the source code for AmIUnique and my other projects can be found on GitHub. A second challenge that I faced was to deal with the increasing load of users so that the server could return personalized statistics to visitors in a timely manner (less than 2/3s). By having a separate entity that updates statistics in real time on top of the database, I managed to drastically reduce the server load. With the number of Tor users around the world, the website needs from the get go to handle a high load of visitors and statistics computation and my previous experience on that specific task will prove useful.
For the very first version of Torprinter, I plan on testing well-known and widespread fingerprinting techniques to make sure that there is no variation among Tor users. These include HTTP headers and known JavaScript objects. There should be no need for any Flash attributes since plugins are not present in the Tor browser (thus removing complex code in charge of correctly loading the Flash object).
We might think about that a bit. We have a bunch of users it seems that are still trying to go through all the hassle and are getting Flash going in Tor Browser. It might be enough to detect them by just enumerating available plugins.
For this proposal, I have also developed a special page with 7 different tests that are mainly targeted at the Tor browser to give an idea of what tests can be included that are more suited to the Tor users. Tests n°5, n°6 and n°7 are broader and also concerns the Firefox browser. You can found a working version of the script on a special webpage (need to scroll to make the results appear): https://plaperdr.github.io/torScript.html The script can be found here: https://plaperdr.github.io/assets/tor/tor.js
Test n°1 Test the size of the current window - As reported by ticket n°14098 https://trac.torproject.org/projects/tor/ticket/14098
FWIW: that test is not working correctly. We cap the width and the height at 1000px and round the window to a multiple of 200pxx100px.
I have updated the test to reflect this, I didn't have all the numbers right. It should be fixed now. I was a little bit surprised to find out that the Tor browser gives the exact size of the window with pixel precision (even if I completely understand why). On Firefox, even in windowed mode, the browser reports the full size of the screen. With a test like this, we could see if the rounding works correctly for all users and find out the percentage of users who resize their windows.
Test n°2 Test the support of emoji - As reported by ticket n°18172 https://trac.torproject.org/projects/tor/ticket/18172 Test n°3 Analysis of the "scroll" behavior of the window - As investiagted by http://jcarlosnorte.com/security/2016/03/06/advanced-tor-browser-fingerprint... Test n°4 Test the size of current fallback font by using the canvas API to render some text (no need for user permission like canvas extraction) - Custom test Test n°5 Test the difference between OS on the maximum font size - Custom test Test n°6 Test the difference between OS on the Date API - As reported by ticket n°15473 https://trac.torproject.org/projects/tor/ticket/15473 Test n°7 Test the difference between OS on the Math class - As reported by ticket n° 13018 https://trac.torproject.org/projects/tor/ticket/13018
Any remarks, suggestions or ideas are very welcome!
We are currently not doing much against OS fingerprinting. So we'll see how useful tests like test 5,6,7 are. Maybe they show us that we should prioritize those things. But on the other hand we still suspect that there are other things out there providing more entropy.
I'm with you on the fact that other tests should provide more entropy. I'm still curious to see if some tests like the one on the Math class is just about OS fingerprinting and that no differences can be observed with users on the same OS.
A general thing to think about while writing the tests would be having them modular as well: a library could contain the functionality used by more than one test (like providing a canvas) while particalar tests would make use of it for specific purposes. This would prevent code duplication and should help making the project more maintainable.
This sounds like a good idea. If several tests use the same browser API, we could have utility/library functions to set them up and reduce code duplication.
Georg
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev