-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Dear All,
My name is Gunes Acar, a 2nd year PhD student at Computer Security and Industrial Cryptography (COSIC) group of University of Leuven.
I work with Prof. Claudia Diaz and study online tracking and browser fingerprinting. I'd like to work on "Panopticlick" (https://www.torproject.org/getinvolved/volunteer.html.en#panopticlick) summer project and other fingerprinting related issues which I tried to outline below:
1) Collaborate with Peter@EFF to port/open-source Panopticlick: https://trac.torproject.org/projects/tor/ticket/6119#comment:4 a) implement necessary modifications - e.g. we won't be having cookies or real IP addresses to match returning visitors. b) consider security implications of storing fingerprints (e.g. what happens if someone gets access to fingerprint database?)
2) Add machine-readability support outlined in Tor Automation proposals: https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm... a) which one(s) should we implement? JSON, YAML, XML?
3) Survey the literature for fingerprinting attacks published since Panopticlick. Implement those that may apply to TBB: a) Canvas & WebGL fingerprinting (Mowery et al.) - make sure the patch at #6253 works b) JS engine fingerprinting (Mulazzani et al.) c) CSS & rendering engine fingerprinting, (Unger et al.) ...
4) Check with realworld fingerprinting scripts to see if they collect anything that is not considered before. Check if TBB's FP countermeasures work against them. (We can use data from FPDetective study to find sites with fingerprinting scripts)
5) Backport new "attacks" found in 3 & 4 to EFF's Panopticlick in case they consider an update.
6) Convert fixed FP-related bugs into regression tests. https://trac.torproject.org/projects/tor/query?keywords=~tbb-fingerprinting&...
7) Build test cases to check the severity of fingerprinting related open tickets, e.g.: https://trac.torproject.org/projects/tor/ticket/8770 https://trac.torproject.org/projects/tor/ticket/10299
8) Work on potential fingerprinting bugs that ESR31 may bring.
9) ESR transitions seem to create a lot of FP-related issues that need to be checked manually (e.g. #9608). Consider developing a tool that iterates over the host objects of two browsers to compare them automatically (e.g. to pinpoint new objects, new methods, updated default values, etc.). Similar to "diff tool" mentioned here: https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm...
10) Evaluate the font-limits of TBB by checking the average # of fonts Top 1 Million sites use. We can either collect fresh data with FPDetective or use the existing (~1 year old) data.
More on my background relevant to fingerprinting and TBB code base:
We recently published a paper called "FPDetective: Dusting the Web for Fingerprinters" (CCS'13) to measure the prevalence of browser fingerprinting on the Internet. As a part of this study, we built instrumented browsers from Chromium and PhantomJS source code and developed a framework to detect fingerprinting (https://github.com/fpdetective/).
I also got my hands dirty with the TBB source code to seek vulnerabilities in FP countermeasures. Two ways I found to circumvent existing font limits were as follows: https://trac.torproject.org/projects/tor/ticket/8270#comment:2 https://trac.torproject.org/projects/tor/ticket/5798#comment:13
Other pointers: My website: http://www.esat.kuleuven.be/cosic/?page_id=126 FPDetective website: https://www.cosic.esat.kuleuven.be/fpdetective/ My (first & only) Tor patch: https://trac.torproject.org/projects/tor/ticket/10472 My Tor FAQ profile: http://tor.stackexchange.com/users/731/gacar
Looking for your comments, Cheers, Gunes
N.B.: I won't be applying to GSoC.
Hi,
Gunes Acar:
Dear All,
My name is Gunes Acar, a 2nd year PhD student at Computer Security and Industrial Cryptography (COSIC) group of University of Leuven.
I work with Prof. Claudia Diaz and study online tracking and browser fingerprinting. I'd like to work on "Panopticlick" (https://www.torproject.org/getinvolved/volunteer.html.en#panopticlick) summer project and other fingerprinting related issues which I tried to outline below:
thanks for your interest! That is a lot of stuff to do for one summer. :) I am happy to help you with whatever you choose to work on but, personally, I'd like to see the Panopticlick project get into shape as outlined in the link you referred to. As you are not applying for the GSoC what time frame do you have in mind for working on these things?
Georg
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Georg,
I plan to dedicate the 3 months, full time, from early June to early September. But I'm flexible with the dates.
I admit that doing all these might be unrealistic, maybe we can assign priorities to different tasks.
Best, Gunes
On Mon 17 Mar 2014 03:44:50 PM CET, Georg Koppen wrote:
Hi,
Gunes Acar:
Dear All,
My name is Gunes Acar, a 2nd year PhD student at Computer Security and Industrial Cryptography (COSIC) group of University of Leuven.
I work with Prof. Claudia Diaz and study online tracking and browser fingerprinting. I'd like to work on "Panopticlick" (https://www.torproject.org/getinvolved/volunteer.html.en#panopticlick) summer project and other fingerprinting related issues which I tried to outline below:
thanks for your interest! That is a lot of stuff to do for one summer. :) I am happy to help you with whatever you choose to work on but, personally, I'd like to see the Panopticlick project get into shape as outlined in the link you referred to. As you are not applying for the GSoC what time frame do you have in mind for working on these things?
Georg
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Gunes Acar:
My name is Gunes Acar, a 2nd year PhD student at Computer Security and Industrial Cryptography (COSIC) group of University of Leuven.
I work with Prof. Claudia Diaz and study online tracking and browser fingerprinting. I'd like to work on "Panopticlick" (https://www.torproject.org/getinvolved/volunteer.html.en#panopticlick) summer project and other fingerprinting related issues which I tried to outline below:
- Collaborate with Peter@EFF to port/open-source Panopticlick:
https://trac.torproject.org/projects/tor/ticket/6119#comment:4 a) implement necessary modifications - e.g. we won't be having cookies or real IP addresses to match returning visitors. b) consider security implications of storing fingerprints (e.g. what happens if someone gets access to fingerprint database?)
- Add machine-readability support outlined in Tor Automation
proposals: https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm... a) which one(s) should we implement? JSON, YAML, XML?
- Survey the literature for fingerprinting attacks published since
Panopticlick. Implement those that may apply to TBB: a) Canvas & WebGL fingerprinting (Mowery et al.) - make sure the patch at #6253 works b) JS engine fingerprinting (Mulazzani et al.) c) CSS & rendering engine fingerprinting, (Unger et al.) ...
This sounds good. We already have a fix for #1 though, but verification can't hurt (the canvas should come back as all white unless the user allows it).
We also have a couple fixes for CSS-based fingerprinting (fonts and system colors) that are entropy-reduction efforts only. Actually measuring the amount of entropy reduction here would be useful.
- Check with realworld fingerprinting scripts to see if they collect
anything that is not considered before. Check if TBB's FP countermeasures work against them. (We can use data from FPDetective study to find sites with fingerprinting scripts)
Great.
- Backport new "attacks" found in 3 & 4 to EFF's Panopticlick in case
they consider an update.
Unfortunately, the EFF has been reluctant to work with us in any way to improve or re-deploy Panopticlick for our needs, hence the frustrated tone of my other mail in this thread. It also seems that the EFF would not permit your resulting work to be open source, which I believe is a violation of the GSoC rules. I guess since you are not intending to actually apply to GSoC, this is a moot point though. It's just also a sore one for me, so I figured I'd poke it once more ;).
However, as I also said in my other mail, I actually think we may be better served by developing something independent of Panopticlick. We need per-TBB version breakdowns of all the statistics we record, so we can measure the change in entropy as we deploy fixes and improvements to our defenses, without previous datapoints biasing the distribution.
Other than some helper functions to store data and calculate entropy, and one (or maybe two) simple fingerprinting tests, we should not need any of the Panopticlick code for this project. It's also likely that our DB schema will end up radically different, due to the need to segment data by browser version (which may be input by the user), and the need for many more (and more varied) tests than they have.
- Convert fixed FP-related bugs into regression tests.
https://trac.torproject.org/projects/tor/query?keywords=~tbb-fingerprinting&...
- Build test cases to check the severity of fingerprinting related
open tickets, e.g.: https://trac.torproject.org/projects/tor/ticket/8770 https://trac.torproject.org/projects/tor/ticket/10299
Work on potential fingerprinting bugs that ESR31 may bring.
ESR transitions seem to create a lot of FP-related issues that need
to be checked manually (e.g. #9608). Consider developing a tool that iterates over the host objects of two browsers to compare them automatically (e.g. to pinpoint new objects, new methods, updated default values, etc.). Similar to "diff tool" mentioned here: https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm...
I am not sure this is helpful. In general, we only want to measure fingerprintability *within* a specific browser version.
To determine the appearance of new APIs, it's probably best and simplest to simply review Mozilla's Developer Documentation, ie: https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/24
- Evaluate the font-limits of TBB by checking the average # of fonts
Top 1 Million sites use. We can either collect fresh data with FPDetective or use the existing (~1 year old) data.
Excellent.
More on my background relevant to fingerprinting and TBB code base:
We recently published a paper called "FPDetective: Dusting the Web for Fingerprinters" (CCS'13) to measure the prevalence of browser fingerprinting on the Internet. As a part of this study, we built instrumented browsers from Chromium and PhantomJS source code and developed a framework to detect fingerprinting (https://github.com/fpdetective/).
I also got my hands dirty with the TBB source code to seek vulnerabilities in FP countermeasures. Two ways I found to circumvent existing font limits were as follows: https://trac.torproject.org/projects/tor/ticket/8270#comment:2 https://trac.torproject.org/projects/tor/ticket/5798#comment:13
Right. A proper fix for #5798 will address this, as documented in the bug. I would not get distracted leveraging implementation bugs like this in your data collection, especially if we're aware of them and simply haven't had the development capacity to fix them.
Actual shortcomings of a defense in an information-theoretic sense (such as 10 fonts being too many, or our screen resolution leaking too much entropy) are better areas of focus, given the time.
However, if you also want to write any TBB patches to fix any implementation bugs (known or newly discovered), you will be my new hero. :)
Other pointers: My website: http://www.esat.kuleuven.be/cosic/?page_id=126 FPDetective website: https://www.cosic.esat.kuleuven.be/fpdetective/ My (first & only) Tor patch: https://trac.torproject.org/projects/tor/ticket/10472 My Tor FAQ profile: http://tor.stackexchange.com/users/731/gacar
Looking for your comments, Cheers, Gunes
N.B.: I won't be applying to GSoC.
Oh wow. Ok. I suppose that settles the GSoC Open Source snafu, and in a way that doesn't cause a legal disagreement with the EFF. Hurray! ;)
Please keep me updated of your progress and plans, in any case!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Thanks for all the feedback Mike, I'll be in touch with you and Georg on the Tor side.
For the other discussion: I don't think open-sourcing Panopticlick is critical for this work. Rather, it's Panopticlick data that may inform the information-theoretic evaluation of TBB's countermeasures (e.g. to get a sense of font distribution to evaluate font limits: Can I pick/probe 10 fonts to identify all Tor users?)
Actually we've been planning to publish the aggregated Panopticlick data with Peter and this may be the perfect time to do it. Frankly, I don't think EFF was reluctant to collaborate with you, rather I guess they simply had no time. That's why I'm very glad to hear from Yan who seem to be able to dedicate more time.
I hope that'd work for everybody - me dedicating some of my time (1-2 weeks) to work on the analysis of Panopticlick data.
Best, Gunes
On Wed 19 Mar 2014 04:14:05 AM CET, Mike Perry wrote:
Gunes Acar:
My name is Gunes Acar, a 2nd year PhD student at Computer Security and Industrial Cryptography (COSIC) group of University of Leuven.
I work with Prof. Claudia Diaz and study online tracking and browser fingerprinting. I'd like to work on "Panopticlick" (https://www.torproject.org/getinvolved/volunteer.html.en#panopticlick)
summer
project and other fingerprinting related issues which I tried to outline below:
- Collaborate with Peter@EFF to port/open-source Panopticlick:
https://trac.torproject.org/projects/tor/ticket/6119#comment:4 a) implement necessary modifications - e.g. we won't be having cookies or real IP addresses to match returning visitors. b) consider security implications of storing fingerprints (e.g. what happens if someone gets access to fingerprint database?)
- Add machine-readability support outlined in Tor Automation
proposals: https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm...
a) which one(s) should we implement? JSON, YAML, XML?
- Survey the literature for fingerprinting attacks published
since Panopticlick. Implement those that may apply to TBB: a) Canvas & WebGL fingerprinting (Mowery et al.) - make sure the patch at #6253 works b) JS engine fingerprinting (Mulazzani et al.) c) CSS & rendering engine fingerprinting, (Unger et al.) ...
This sounds good. We already have a fix for #1 though, but verification can't hurt (the canvas should come back as all white unless the user allows it).
We also have a couple fixes for CSS-based fingerprinting (fonts and system colors) that are entropy-reduction efforts only. Actually measuring the amount of entropy reduction here would be useful.
- Check with realworld fingerprinting scripts to see if they
collect anything that is not considered before. Check if TBB's FP countermeasures work against them. (We can use data from FPDetective study to find sites with fingerprinting scripts)
Great.
- Backport new "attacks" found in 3 & 4 to EFF's Panopticlick in
case they consider an update.
Unfortunately, the EFF has been reluctant to work with us in any way to improve or re-deploy Panopticlick for our needs, hence the frustrated tone of my other mail in this thread. It also seems that the EFF would not permit your resulting work to be open source, which I believe is a violation of the GSoC rules. I guess since you are not intending to actually apply to GSoC, this is a moot point though. It's just also a sore one for me, so I figured I'd poke it once more ;).
However, as I also said in my other mail, I actually think we may be better served by developing something independent of Panopticlick. We need per-TBB version breakdowns of all the statistics we record, so we can measure the change in entropy as we deploy fixes and improvements to our defenses, without previous datapoints biasing the distribution.
Other than some helper functions to store data and calculate entropy, and one (or maybe two) simple fingerprinting tests, we should not need any of the Panopticlick code for this project. It's also likely that our DB schema will end up radically different, due to the need to segment data by browser version (which may be input by the user), and the need for many more (and more varied) tests than they have.
- Convert fixed FP-related bugs into regression tests.
https://trac.torproject.org/projects/tor/query?keywords=~tbb-fingerprinting&...
7) Build test cases to check the severity of fingerprinting related
open tickets, e.g.: https://trac.torproject.org/projects/tor/ticket/8770 https://trac.torproject.org/projects/tor/ticket/10299
Work on potential fingerprinting bugs that ESR31 may bring.
ESR transitions seem to create a lot of FP-related issues that
need to be checked manually (e.g. #9608). Consider developing a tool that iterates over the host objects of two browsers to compare them automatically (e.g. to pinpoint new objects, new methods, updated default values, etc.). Similar to "diff tool" mentioned here: https://people.torproject.org/~boklm/automation/tor-automation-proposals.htm...
I am not sure this is helpful. In general, we only want to measure
fingerprintability *within* a specific browser version.
To determine the appearance of new APIs, it's probably best and simplest to simply review Mozilla's Developer Documentation, ie: https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/24
- Evaluate the font-limits of TBB by checking the average # of
fonts Top 1 Million sites use. We can either collect fresh data with FPDetective or use the existing (~1 year old) data.
Excellent.
More on my background relevant to fingerprinting and TBB code base:
We recently published a paper called "FPDetective: Dusting the Web for Fingerprinters" (CCS'13) to measure the prevalence of browser fingerprinting on the Internet. As a part of this study, we built instrumented browsers from Chromium and PhantomJS source code and developed a framework to detect fingerprinting (https://github.com/fpdetective/).
I also got my hands dirty with the TBB source code to seek vulnerabilities in FP countermeasures. Two ways I found to circumvent existing font limits were as follows: https://trac.torproject.org/projects/tor/ticket/8270#comment:2 https://trac.torproject.org/projects/tor/ticket/5798#comment:13
Right. A proper fix for #5798 will address this, as documented in the bug. I would not get distracted leveraging implementation bugs like this in your data collection, especially if we're aware of them and simply haven't had the development capacity to fix them.
Actual shortcomings of a defense in an information-theoretic sense (such as 10 fonts being too many, or our screen resolution leaking too much entropy) are better areas of focus, given the time.
However, if you also want to write any TBB patches to fix any implementation bugs (known or newly discovered), you will be my new hero. :)
Other pointers: My website: http://www.esat.kuleuven.be/cosic/?page_id=126 FPDetective website: https://www.cosic.esat.kuleuven.be/fpdetective/ My (first & only) Tor patch: https://trac.torproject.org/projects/tor/ticket/10472 My Tor FAQ profile: http://tor.stackexchange.com/users/731/gacar
Looking for your comments, Cheers, Gunes
N.B.: I won't be applying to GSoC.
Oh wow. Ok. I suppose that settles the GSoC Open Source snafu, and in a way that doesn't cause a legal disagreement with the EFF. Hurray! ;)
Please keep me updated of your progress and plans, in any case!
_______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Gunes Acar:
Thanks for all the feedback Mike, I'll be in touch with you and Georg on the Tor side.
For the other discussion: I don't think open-sourcing Panopticlick is critical for this work.
Sure, we can always write new code. That said, if you want to do that (I am still not sure about it) it might be a good idea to write up a proposal containing the requirements Mike mentioned in his email and how you'd like to implement them. Additionally, it might be a good idea if code could somehow be shared with tests needed for QA. Maybe the feature extraction part could be modularized in a way that both can share, say, the feature extraction part. Dunno if that works or is smart, though...
Actually we've been planning to publish the aggregated Panopticlick data with Peter and this may be the perfect time to do it. Frankly, I don't think EFF was reluctant to collaborate with you, rather I guess they simply had no time.
Oh, you have missed Yan Zhu's last email. I'll quote the relevant parts for your convenience:
"Peter says he has some reluctance to open source the project (not the data) because it might make it easier for some websites to track visitors without their consent."
Georg
Georg Koppen:
code could somehow be shared with tests needed for QA. Maybe the feature extraction part could be modularized in a way that both can share, say, the feature extraction part.
That should have been
"Maybe the tests could be modularized in a way that both can share, say, the feature extraction part. *sigh*
Georg