[ooni-dev] Mining OONI reports to find server-side Tor blocking (e.g. CloudFlare captchas)

Daniel Ramsay daniel at dretzq.org.uk
Sun Jun 21 16:04:43 UTC 2015


That's very interesting - I'm working on a project that aims to provide
real-time blocking detection using ooni-probe's http_requests test, and
the false-positives caused by TOR-blocking are a big problem there.

Being able to identify TOR-blocking would definitely help us with the
elimination of false-positives, although I was also meaning to ask the
list if there are any other tests that you'd recommend to use in
combination with http_requests to improve results accuracy?

Many thanks,

Daniel.

On 19/06/15 18:03, David Fifield wrote:
> I want to search OONI reports for cases of Tor exits being blocked by
> the server (things like the CloudFlare 403 captcha). The http_requests
> test is great for that because it fetches a bunch of web pages with Tor
> and without.
> 
> The attached script is my first-draft attempt at finding block pages.
> Its output is at the end of this message. You can see it finds a lot of
> CloudFlare captchas and other blocks.
> 
> First, I ran ooniprobe to get a report-http_requests.yamloo file. Then I
> ran the script, which does this:
> 	Skip the first YAML document, because it's a header.
> 	For all other documents:
> 		Skip it if it has non-None control_failure or
> 		experiment_failure--there are a few of these.
> 
> 		Look for exactly two non-failed requests, one with is_tor:false
> 		and one with is_tor:true. Skip it if it lacks these.
> 
> 		Classify the blocked status of the is_tor:false and is_tor:true
> 		responses. 400-series and 500-series status codes are classified
> 		as blocked and all others are unblocked.
> 
> 		Print an output line if the blocked status of is_tor:false does
> 		not match the blocked status of is_tor:true.
> 
> I have a few questions.
> 
> Is this a reasonable way to process reports? Is there a more standard
> way to do e.g. YAMLOO processing?
> 
> I know there are many reports at https://ooni.torproject.org/reports/.
> Is that all of them? I think I heard from Arturo that some reports are
> not online because of storage issues.
> 
> What's the best way for me to get the reports for processing? Just
> download all *http_requests* files from the web server?
> 
> 
> Here is the output of the script. 403-CLOUDFLARE is the famous
> "Attention Required!" captcha page. I investigated some of the others
> manually and they are mostly custom block pages or generic web server
> 403s. (There are also a couple of CloudFlare pages that have a different
> form.) Overall, almost 4% of the 1000 URLs scanned by ooniprobe served a
> block page over Tor.
> 
> I'm not sure what's up with the non-Tor 503s from Amazon. They just look
> like localized internal service error pages ("ist ein technischer Fehler
> aufgetreten", "une erreur de système interne a été décelée"). The one
> for blog.com is a generic Nginx "Bad Gateway" page.
> 
> non-Tor		Tor		domain
> 302		403-OTHER	yandex.ru
> 302		403-OTHER	craigslist.org
> 301		403-CLOUDFLARE	thepiratebay.se
> 503-OTHER	301		amazon.de
> 200		403-CLOUDFLARE	adf.ly
> 301		403-OTHER	squidoo.com
> 301		410-OTHER	myspace.com
> 303		503-OTHER	yelp.com
> 302		403-CLOUDFLARE	typepad.com
> 503-OTHER	301		amazon.fr
> 301		403-CLOUDFLARE	digitalpoint.com
> 301		403-CLOUDFLARE	extratorrent.com
> 200		403-OTHER	ezinearticles.com
> 200		403-OTHER	hubpages.com
> 200		403-OTHER	2ch.net
> 200		403-OTHER	hdfcbank.com
> 302		403-CLOUDFLARE	meetup.com
> 302		403-CLOUDFLARE	1channel.ch
> 200		403-CLOUDFLARE	multiply.com
> 301		403-CLOUDFLARE	clixsense.com
> 301		403-OTHER	zillow.com
> 301		403-CLOUDFLARE	odesk.com
> 301		403-CLOUDFLARE	elance.com
> 301		403-CLOUDFLARE	youm7.com
> 200		403-CLOUDFLARE	jquery.com
> 200		403-CLOUDFLARE	sergey-mavrodi.com
> 301		403-CLOUDFLARE	templatemonster.com
> 302		403-CLOUDFLARE	4tube.com
> 301		403-CLOUDFLARE	mp3skull.com
> 301		403-CLOUDFLARE	porntube.com
> 200		403-OTHER	tutsplus.com
> 200		403-CLOUDFLARE	bitshare.com
> 301		403-OTHER	sears.com
> 200		403-CLOUDFLARE	zwaar.net
> 502-OTHER	200		blog.com
> 302		403-CLOUDFLARE	myegy.com
> 301		400-OTHER	mercadolibre.com.ve
> 302		403-OTHER	jabong.com
> 301		403-CLOUDFLARE	free-tv-video-online.me
> 302		403-CLOUDFLARE	traidnt.net
> 
> 
> 
> _______________________________________________
> ooni-dev mailing list
> ooni-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev
> 


More information about the ooni-dev mailing list