[tor-dev] GSoC'16 proposal: the Torprinter project (a Panopticlick-like website)

Pierre Laperdrix pierre.laperdrix at irisa.fr
Fri Mar 18 10:31:45 UTC 2016

On 03/17/2016 06:02 PM, gunes acar wrote:
> Hi Pierre,
> On 2016-03-16 11:58, Pierre Laperdrix wrote:
>> Hi Gunes,
>> Thanks a lot for the feedback!
>> On 03/16/2016 03:30 PM, gunes acar wrote:
>>> Hi Pierre,
>>> Thanks for the very well thought proposal!
>>> I'm curious about your ideas on the "returning device problem." EFF's
>>> Panopticlick and AmIUnique.org use a combination of cookies and IP
>>> address to recognize returning users - so that their fingerprints are
>>> not "double-counted."
>>> Since these signals will not available anymore (unless the user opt-ins
>>> to retain the cookie), I wonder what'd be your ideas to address this issue.
>> This one is a really interesting question but a tricky one because we
>> can't really rely on the cookies+IP combination with the Tor browser.
>> My answer here is simple: it all depends on the goal we set for the website.
> I think the original goals were to understand the fingerprint
> distribution and to measure the effect of introduced defenses (e.g.
> by measuring the uniqueness/entropy before vs. after the defense).
> I agree with you that guaranteeing no double-counting may not be
> possible, especially if we consider a determined attacker. A more
> realistic goal could be to filter out double-submissions from benign users.
> Let me point out an idea raised in the previous discussions:
> One option to enroll users for the tests was to have a link on the
> about:tor page similar to "Test Tor Network Settings" link. The
> fingerprint link could also include (e.g. as URL parameters) TB version,
> localization and OS type to establish ground truth for the tests.
> I wonder, if the same link can be used to signal a fresh fingerprint
> submission to the server. This may require keeping a boolean state (!)
> on the client side which may mean "already submitted a fingerprint with
> the current TB version." This state can be kept in TorButton's storage,
> away from the reach of non-chrome scripts. The fingerprinting site could
> then use this parameter to distinguish between fresh and recurrent
> submissions.
> An alternative can be to present a fresh submission link on the
> "changelog" tab, which is guaranteed to be shown once after each update
> - right when we want to collect a new test from users.
> Perhaps we should be cautious about keeping any client-side state, and
> be clear about the limitations of these approaches. But I feel like the
> way we enroll the users can be used to prevent pollution, at least by
> the well-behaving Tor users. Just wanted point out this line of thought,
> no doubt you can come up with better alternatives.
> Best,
> Gunes

I was so focused on basic browser mechanisms and what we could do with
fingerprinting that I forgot that something can be done inside the browser.
Even though storing a boolean won't totally fix the problem of someone
polluting the database, if we want to analyze the distribution of
fingerprints, this may be a good step forward for legitimate users who
want to contribute.
Then comes the question of the "recurrent" or "not fresh" submissions.
If we only store the first "fresh" fingerprint, we may miss subsequent
fingerprints that may be more interesting for us.
So, one solution would be:
- Storage of all fresh fingerprints. This would give an idea of the
fingerprint distribution.
- Storage of recurrent fingerprints while removing duplicates. This
would give all the possible values for a specific attribute and the same
device could contribute several times.
I don't know if this seems a good approach or if this is too
complicated. It is a hard balance to keep with privacy on one side and
relevant data on the other. If we really have to identify returning
users, we need some kind of ID somewhere but even that could be modified.

Also, to have a ground truth given directly by the browser seems to be
really good. At first, I thought about detecting the browser version
through fingerprinting but when I looked at some of the changelog, some
updates may not be detectable through fingerprinting (example with minor
version 5.5.2 of the Tor browser).


>> Do we want to learn how many different values there are for a specific
>> test so that we can reduce diversity among Tor users? In that case, the
>> site would not store duplicated fingerprints or it could be
>> finer-grained and not store duplicated values for each test.
>> Or do we want to go further and know the actual distribution of values
>> among Tor users so that it may guide the development of a potential
>> defense? In this case, the site must identify returning users and it is
>> a lot harder to do here. The only method that comes to mind and that
>> would be accurate enough to work in this situation would be to put the
>> test suite behind some kind of registration system. The problem is that
>> a mandatory registration goes in the complete opposite direction of what
>> Tor is about and it would greatly limit the number of participating
>> users (or even render the site useless before it is even launched). A
>> solution in the middle would be not to store duplicated fingerprints but
>> I really don't know how much it would affect the statistics in the long
>> run. Would it be marginal and affect like 2/3/4% of collected
>> fingerprints or would it be a lot more and go above 20% or else?
>> Finally, I thought about using additional means of identification like
>> canvas fingerprinting but I don't think there would be enough diversity
>> here to identify a browser.
>>> Please find other responses below.
>>> Best,
>>> Gunes
>>> On 2016-03-15 04:46, Pierre Laperdrix wrote:
>>>> Hi Tor Community,
>>>> ....
>>>> - How closed/private/transparent should the website be about its tests
>>>> and the results? Should every tests be clearly indicated on the webpage
>>>> with their own description? or should some tests stay hidden to prevent
>>>> spreading usable tests to fingerprint Tor users?
>>> I think the site should be transparent about the tests it runs. Perhaps
>>> the majority of the fingerprinting tests/code will run on the client
>>> side and can be easily captured by anyone with necessary skills (even if
>>> you obfuscate them).
>> You are right on that. It makes sense to be transparent since obfuscated
>> JS code can be deciphered by someone with the necessary skills.
>> Also, if tests are hidden, most Tor users would rightfully be wary on
>> what is exactly being executed in their browser and they would simply
>> not take the test. In that case, the impact of the website would be
>> greatly limited. Being transparent really seems to be the right way to
>> go here.
>>>> - Should a statistics page exist? Should we give a read access to the
>>>> database to every user (like in the form of a REST API or other solutions)?
>>> I think aggregate statistics should be available publicly but exposing
>>> individual fingerprints publicly may not be necessary.
>> Like you said, aggregate statistics seem to be the best solution here.
>> Then, I'm wondering if it would be possible to offer the complete list
>> of values for each attribute separately from others. Then, my concern is
>> how easy it would be to correlate separate attributes to recreate
>> fingerprints, even partial ones.
>> Regards,
>> Pierre
>> _______________________________________________
>> tor-dev mailing list
>> tor-dev at lists.torproject.org
>> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160318/5f370a00/attachment.sig>

More information about the tor-dev mailing list