[ooni-dev] Mining OONI reports to find server-side Tor blocking (e.g. CloudFlare captchas)
daniel at dretzq.org.uk
Sun Jun 21 16:04:43 UTC 2015
That's very interesting - I'm working on a project that aims to provide
real-time blocking detection using ooni-probe's http_requests test, and
the false-positives caused by TOR-blocking are a big problem there.
Being able to identify TOR-blocking would definitely help us with the
elimination of false-positives, although I was also meaning to ask the
list if there are any other tests that you'd recommend to use in
combination with http_requests to improve results accuracy?
On 19/06/15 18:03, David Fifield wrote:
> I want to search OONI reports for cases of Tor exits being blocked by
> the server (things like the CloudFlare 403 captcha). The http_requests
> test is great for that because it fetches a bunch of web pages with Tor
> and without.
> The attached script is my first-draft attempt at finding block pages.
> Its output is at the end of this message. You can see it finds a lot of
> CloudFlare captchas and other blocks.
> First, I ran ooniprobe to get a report-http_requests.yamloo file. Then I
> ran the script, which does this:
> Skip the first YAML document, because it's a header.
> For all other documents:
> Skip it if it has non-None control_failure or
> experiment_failure--there are a few of these.
> Look for exactly two non-failed requests, one with is_tor:false
> and one with is_tor:true. Skip it if it lacks these.
> Classify the blocked status of the is_tor:false and is_tor:true
> responses. 400-series and 500-series status codes are classified
> as blocked and all others are unblocked.
> Print an output line if the blocked status of is_tor:false does
> not match the blocked status of is_tor:true.
> I have a few questions.
> Is this a reasonable way to process reports? Is there a more standard
> way to do e.g. YAMLOO processing?
> I know there are many reports at https://ooni.torproject.org/reports/.
> Is that all of them? I think I heard from Arturo that some reports are
> not online because of storage issues.
> What's the best way for me to get the reports for processing? Just
> download all *http_requests* files from the web server?
> Here is the output of the script. 403-CLOUDFLARE is the famous
> "Attention Required!" captcha page. I investigated some of the others
> manually and they are mostly custom block pages or generic web server
> 403s. (There are also a couple of CloudFlare pages that have a different
> form.) Overall, almost 4% of the 1000 URLs scanned by ooniprobe served a
> block page over Tor.
> I'm not sure what's up with the non-Tor 503s from Amazon. They just look
> like localized internal service error pages ("ist ein technischer Fehler
> aufgetreten", "une erreur de système interne a été décelée"). The one
> for blog.com is a generic Nginx "Bad Gateway" page.
> non-Tor Tor domain
> 302 403-OTHER yandex.ru
> 302 403-OTHER craigslist.org
> 301 403-CLOUDFLARE thepiratebay.se
> 503-OTHER 301 amazon.de
> 200 403-CLOUDFLARE adf.ly
> 301 403-OTHER squidoo.com
> 301 410-OTHER myspace.com
> 303 503-OTHER yelp.com
> 302 403-CLOUDFLARE typepad.com
> 503-OTHER 301 amazon.fr
> 301 403-CLOUDFLARE digitalpoint.com
> 301 403-CLOUDFLARE extratorrent.com
> 200 403-OTHER ezinearticles.com
> 200 403-OTHER hubpages.com
> 200 403-OTHER 2ch.net
> 200 403-OTHER hdfcbank.com
> 302 403-CLOUDFLARE meetup.com
> 302 403-CLOUDFLARE 1channel.ch
> 200 403-CLOUDFLARE multiply.com
> 301 403-CLOUDFLARE clixsense.com
> 301 403-OTHER zillow.com
> 301 403-CLOUDFLARE odesk.com
> 301 403-CLOUDFLARE elance.com
> 301 403-CLOUDFLARE youm7.com
> 200 403-CLOUDFLARE jquery.com
> 200 403-CLOUDFLARE sergey-mavrodi.com
> 301 403-CLOUDFLARE templatemonster.com
> 302 403-CLOUDFLARE 4tube.com
> 301 403-CLOUDFLARE mp3skull.com
> 301 403-CLOUDFLARE porntube.com
> 200 403-OTHER tutsplus.com
> 200 403-CLOUDFLARE bitshare.com
> 301 403-OTHER sears.com
> 200 403-CLOUDFLARE zwaar.net
> 502-OTHER 200 blog.com
> 302 403-CLOUDFLARE myegy.com
> 301 400-OTHER mercadolibre.com.ve
> 302 403-OTHER jabong.com
> 301 403-CLOUDFLARE free-tv-video-online.me
> 302 403-CLOUDFLARE traidnt.net
> ooni-dev mailing list
> ooni-dev at lists.torproject.org
More information about the ooni-dev