During the OONI survey to find instances of server-side Tor blocking, we found a few variations on CloudFlare captcha pages. They don't all say "Attention Required!". Apparently there is an option to customize the page, but few sites make use of it. Here are the regexes we used (excerpted from https://www.bamsoftware.com/git/ooni-tor-blocks.git): if status == 403: if server == "cloudflare-nginx" and re.search("<title>Attention Required! \| CloudFlare</title>|One more step to access", body): return True, "403-CLOUDFLARE" if server == "cloudflare-nginx" and re.search("<noscript id="cf-captcha-bookmark" class="cf-captcha-info">|<button type="submit" class="cf-captcha-submit">", body): # A customized captcha page. return True, "403-CLOUDFLARE" if server == "cloudflare-nginx" and re.search("<title>Access denied \| [^ ]* used CloudFlare to restrict access</title>", body): # With this one you don't get a captcha. May be controlled by the # site operator. return True, "403-CLOUDFLARE" if status == 503: if re.search("<div class="cf-browser-verification cf-im-under-attack">", body): return True, "503-CLOUDFLARE" I now think the 'server == "cloudflare-nginx"' tests are unnecessary. The last two patterns above don't even give you a captcha to solve, just deny access. You might want to limit your detection to 403 and 503 responses (or maybe exempt 200-series and 300-series responses).
These are a couple of sites that used customized CloudFlare: https://4chan.org/ ("Verification Required") https://yelp.com/ ("You're not barking up the wrong tree...") yelp.com only started using CloudFlare a little while ago. It's a funny case, because they *also* implement a hard Tor blacklist. Once you get through the CloudFlare captcha 403, you get a 503 from a different system.