[tor-dev] GSoC'16 proposal: the Torprinter project (a Panopticlick-like website)
gk at torproject.org
Thu Mar 17 21:06:59 UTC 2016
thanks for this proposal. Gunes has already raised some good points and
I won't repeat them here. This is part one of my feedback as I need a
bit more time to think about the code example section.
> Hi Tor Community,
> My name is Pierre and I'm really interested in participating in a GSoC
> project this year with the Tor organization. Since I've been working on
> browser fingerprinting for the past two years, I'd love to build a
> Panopticlick-like website to improve the fingerprinting defenses of the
> Tor browser.
> I've included below my proposal in case anyone has ideas or suggestions,
> especially on the technical section or on some of the open questions
> that I have. (It should be noted that the Torprinter name is subject to
> Summary - The Torprinter project: a browser fingerprinting website to
> improve Tor fingerprinting defenses
> The capabilities of browser fingerprinting as a tool to track users
> online has been demonstrated by Panopticlick and other research papers
> since 2010. The Tor community is fully aware of the problem and the Tor
> browser has been modified to follow the "one fingerprint for all"
> approach. Spoofing HTTP headers, removing plugins, including bundled
> fonts, preventing canvas image extraction: these are a few examples of
> the progress made by Tor developers to protect their users against such
> threat. However, due to the constant evolution of the web and its
> underlying technologies, it has become a true challenge to always stay
> ahead of the latest fingerprinting techniques.
> I'm deeply interested in privacy and I've been studying browser
> fingerprinting for the past 2 years. I've launched 18 months ago the
> AmIUnique.org website to investigate the latest fingerprinting
> techniques. Collecting data on thousands of devices is one of the keys
> to understand and counter the fingerprinting problem.
> For this Google Summer of Code project, I propose to develop the
> Torprinter website that will run a fingerprinting test suite and collect
> data from Tor browsers to help developers design and test new defenses
> against browser fingerprinting. The website will be similar to AmIUnique
> or Panopticlick for users where they will get a complete summary with
> statistics after the test suite has been executed. It can be used to
> test new fingerprinting protection as well as making sure that
> fingerprinting-related bugs were correctly fixed with specific
> regression tests. The expected long-term impact of this project is to
> reduce the differences between Tor users and reinforce their privacy and
> anonymity online. In a second step, the website could open its doors to
> more browsers so that it could become a platform where vendors can
> implement significant changes in their browsers with regards to privacy
> and see the impact first-hand on the website. With the strong expertise
> I have acquired on the fingerprinting subject and the experience I have
> gained by developing the AmIUnique website, I believe I'm fully
> qualified to see such a project through to completion.
> Website features
> The main feature of the website is to collect a set of fingerprintable
> attributes on the client and calculate the distribution of values for
> each attribute like Panopticlick or AmIUnique. The set of tests would
> not only include known fingerprinting techniques but also ones developed
> specifically for the Tor browser.
> The second main feature of the website would be for Tor users to check
> how close their current fingerprint is from the ideal unique fingerprint
> that most users should share. A list of actions should be added to help
> users configure their browser to reach this ideal fingerprint.
We might want to think about that ideal fingerprint idea a bit. I think
there is no such a thing even for Tor Browser users as we are e.g.
rounding the content window size to a multiple of 200x100 for each user.
Thus, we have at least one fingerprintable attribute where we say "you
are good if you have one out of a bunch of possible values". The same
holds for our security slider which basically partitions the Tor Browser
users. We could revisit these design decisions and I am especially
interested in getting data that is backing/not backing our decisions
regarding them. Nevertheless, I assume we won't always be able to put
users into just one bucket per attribute due to usability issues. And
this in turn makes the idea to help users configure their browser not
> The third main feature would be an API for automated tests as detailed
> by this page :
> . This would enable automatic verification of Tor protection features
> with regard to fingerprinting. When a new version is released, the
> output of specific tests will be verified to check for any
> evolution/changes/regressions from previous versions.
> The fourth main feature I'd like to include is a complete stats page
> where the user can go through every attribute and filter by OS, browser
> version and more.
> The inclusion of additional features that go beyond the core
> functionnalities of the site should be driven by the needs of the
> developers and the Tor community.
> Still, a lot of open questions remain that should be addressed during
> the bonding period to define precisely how each of these features should
> ultimately work.
> Some of these open questions include:
> - How closed/private/transparent should the website be about its tests
> and the results? Should every tests be clearly indicated on the webpage
> with their own description? or should some tests stay hidden to prevent
> spreading usable tests to fingerprint Tor users?
> - Should a statistics page exist? Should we give a read access to the
> database to every user (like in the form of a REST API or other solutions)?
> - Where the data should be stored? How long should the data be kept? If
> tests are performed by versions, should the data from an old TBB version
> be removed? Should the data be kept a week, a month or more?
I am not sure about how long the data should be kept. It probably
depends on what kind of data we are talking about (e.g. aggregate or
not). I think, though, that data we collected with Tor Browser A should
not get deleted just because Tor Browser A+1 got released. I think, in
fact, we might want to keep that data especially if we want to give
users a guide about how to get a "better" fingerprint. But even if not
we might want to have this data to measure e.g. whether a fix for a
particular fingerprinting vector had an impact and if so, which one.
> - How new tests should be added: A pull request? A form where
> submissions are reviewed by admins? A link to the Tor tracker?
From a Tor perspective opening a ticket and posting the test there or
ideally having a link to a test in the ticket that is fixing the
fingerprinting vector seems like the preferred solution. I'd like to
avoid the situation where tests get added to the system and we don't
know about that dealing with users that are scared because of the new
results. So, yes, some review should be involved here.
> - Should the website only be accessible through Tor?
I don't think so. I am fine with Chrome/IE etc. users that try to see
how they fare on that test. This not-closing-down right from the start
and proper communication about that might be important if we like to
create a better test platform not only for Tor Browser but other vendors
as you alluded to above. (which is a good idea as it encourages
collaboration and a better understanding of the fingerprinting
problematic in general)
> Technical choices
> In my opinion, the website must be accessible and modular. It should
> have the ability to cope with an important number of connections/data.
> With this in mind and the experience gained from developing AmIUnique, I
> plan on using the Play framework with a MongoDB database. Developing the
> website in Java opens the door to many developers to make the website
> better and more robust after its initial launch since it is one of most
> used programming language in the world. On the storage and statistics
> side, MongoDB is a good fit because it is now a mature technology that
> can scale well with an important number of data and connections.
> Moreover, the use of SQL databases for AmIUnique proved to be really
> powerful but the maintenance after the website was launched became a
> tedious task, especially when modifying the underlying model of a
> fingerprint to collect new attributes. The choice of a more flexible and
> modular database seems a better choice for maintenance and for
> adding/removing tests.
If we look at the Tor side I guess we have more experience with Python
code (which includes me) than Java. Thus, by using Python it might be
easier for us to maintain the code in the longer run. That said, I am
fine with the decisions as you made them especially if you are already
familiar with using all these tools/languages. And, hey, we always
encourage students to stay connected to us and get even deeper involved
after the GSoC ended. So, this might then actually be an area for you... ;)
One thing I'd like you to think about, though, is that we have
guidelines for developing services that might be running on Tor project
infrastructure one day:
Not sure if the tools you had in mind above fit the requirements
outlined there. If not, we should try to fix that. (Thanks to Karsten
for pointing that out)
> Estimated timeline
> You will find below a rough estimate of the timeline for the three
> months of the GSoC.
> Community bonding period - Discuss with the mentors and the community
> the set of features that should be included in the very first version of
> the website and clarify the open questions raised in one of the previous
> 23 May - 27 June : Development of the first version of the website with
> the core features
> Week 1 - Development of the first version of the fingerprinting
> script with the core set of attributes. Special attention will be given
> so that it is fully compatible with the most recent version of the Tor
> browser (and older ones too).
> Week 2 - Start developing the front-end and the back-end to store
> fingerprints with a page containing data on your current fingerprint
> (try adding a view to see how close/far you are from the ideal fingerprint).
> Week 3 - Start developing the statistics page with the necessary
> visualization for the users. Modification of the back-end to improve
> statistics computation to lessen the server load.
> Week 4 - Finishing the front-end development and refining the
> statistics page to get back the most relevant information.
> Adding and testing an API to support automated tests.
> Week 5 - Finishing the first version so that it is ready for deployment.
> Start developing additional features requested by the community
> (rest API? account management?)
> 27 June - Mid July :
> Deployment of the first version online for a beta-test with bug fixing.
> Finishing development of additional features requested by the
> Defining the list of new features for the second version.
> Mid July - 23th August :
> Adding a system to make the website as flexible as possible to
> add/remove tests easily (A pull-request system? A test submission form
> where admins review tests before they are included in the test suite?)
> Developing additional features for the website.
> Making sure that the website can be opened to more browsers (work done
> at design time to support any browsers will be tested here)
> Bug fixing
That looks like a good timeline estimation to me.
That's it for the first feedback,
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 801 bytes
Desc: OpenPGP digital signature
More information about the tor-dev