[tor-dev] Tor Friendliness Scanner

Sat Mar 16 21:29:36 UTC 2019

Hello everyone!

Thanks for the feedback! Please see my inline comments.

Santiago:

 > I wonder if you could share a link to the source code for people to give
 > you feedback/audit or implement fixes/features themselves.

I will be posting the source code once I'm a little further along.

Roger:

 > (1)
 > Looking at the DOM tree reminds me of Micah's paper from a few years 
back:
 > "Validating Web Content with Senser"
 > https://security.cs.georgetown.edu/~msherr/pubs.php

This is very interesting! Earlier on I had considered using Merkle Trees 
for this project as well, but we are currently looking at other, more 
suitable options.

Roger:

 > and (2)
 > Be sure to check out the recent papers by the Berkeley group on this
 > area, e.g.  the "do you see what I see" paper and more recent ones:
 > https://www1.icsi.berkeley.edu/~sadia/

Yes, I have read many of these! The "Do you see what I see" paper was 
definitely one of the inspirations for this, as well as some results of 
my previous work.

Gunner:

 > It may or may not be of any use, but here is a content from an etherpad
 > that a number of Tor folks worked on a while back regarding 'tor
 > friendly sites"

Thanks for sending this! The results of this project will definitely 
supplement this etherpad with things that we find are broken "in the 
wild," either as a foreseen result of the design choices of the Tor 
Browser or by unforeseen consequence.

grarpamp:

 > You may be interested in this coupled pair of projects that
 > may be studying a similar question from a different perspective.
 > Note their needs list which might include integrating elements
 > of your platform, OONI, etc.

Thanks for bringing these to our attention! These are certainly 
interesting projects that I think could benefit from the findings of our 
work, when we get there. We may be able to supplement the lists that are 
already built up with more information of services that don't outright 
block Tor, but make it difficult to anonymously use their Web service by 
relying on functionality that is dangerous to anonymity and blocked on 
Tor Browser.

Georg:

 > What are your criteria for saying "this is broken in Tor Browser" vs.
 > "this is just rendered slightly different in Tor Browser"? For instance
 > I suspect that you'd even get different ground-truths depending on the
 > major Firefox version you use (like Firefox 65 vs. Firefox 60 ESR), yet
 > you would hardly say "This is okay in Firefox 65 but broken in Firefox
 > 60 ESR". Or maybe there *are* cases where you would say so? What I am
 > saying is: mapping the creation of the DOM tree and logging JS execution
 > might be a good means for you goal (I am not sure yet) but it does not
 > seem to be sufficient to reach it.

There is some legitimate concern here, and the reason that my e-mail has 
been so late is because I've been considering this. The main observation 
here is that, as far as I know, the Tor Browser is a modification of a 
Firefox ESR, not an entirely stand-alone browser. The goal, then, is to 
take as ground truth the closest version of Firefox that we can.

The Tor Browser starts as a Firefox ESR release and then has changes 
applied to it (patches, extensions, etc). If we use the FF ESR release 
associated with the most current TB, then we can count that as "ground 
truth," since the comparison should isolate to only the changes made to 
FF to turn it into TTB.

Georg:

 > Secondly, I am wondering how you plan to deal with the fact that
 > websites show different content if the logic behind them assumes you
 > come from a different country/region. How does that get incorporated
 > into your ground-truth, for example?

The way we intended to do this was to send our FF "ground-truth" 
collection through Tor, and specifically through the same exit node as 
TTB uses. This way we can isolate the variable to the differences in the 
browsers, rather than any network or other concerns. In addition, we are 
working on developing a method for determining if content is dynamically 
generated (and therefore different every time), or broken.

I hope this addressed all concerns, and if not, or if there is more 
feedback, please let me know!

Thanks,

Kevin

-- 
Kevin Gallagher
Ph.D. Candidate
Center For Cybersecurity
NYU Tandon School of Engineering
Key Fingerprint: D02B 25CB 0F7D E276 06C3  BF08 53E4 C50F 8247 4861