Comment (by ioerror):

 Replying to [comment:23 jgrahamc]:
 > Earlier @ioerror asked if there was open data on abuse from TOR exit
 nodes. In 2014 I wrote a small program called "torhoney" that pulls the
 list of exit nodes and matches it against data from Project Honeypot about
 abuse. That code is here: https://github.com/jgrahamc/torhoney. You can
 run it and see the mapping between an exit node and its Project Honeypot
 score to get a sense for abuse from the exit nodes.
 > I ran the program today and have data on 1,057 exit nodes showing that
 Project Honeypot marks 710 of them as a source of comment spam (67%) with
 567 having a score of greater than 25 (in the Project Honeypot terminology
 meaning it delivered at least 100 spam messages) (54%). Over time these
 values have been trending upwards. I've been recording the Project
 Honeypot data for about 13 months that the percentage of exit nodes that
 were listed as a source of comment spam was about 45% a year ago and is
 now around 65%.

 This is useful though it is unclear - is this what CF uses on the backend?
 Is this data the reason that Google's captchas are so hard to solve?

 Furthermore - what is the expected value for a network with millions of
 users per day?

 > So, I'm interested in hearing about technical ways to resolve these
 problems. Are there ways to reduce the amount of abuse through TOR? Could
 TorBrowser implement a blinded token scheme that would preserve anonymity
 and allow a Turing Test?

 Offering a read only version of these websites that prompts for a captcha
 on POST would be a very basic and simple way to reduce the flood of upset
 users. Ensuring that a captcha is solved and not stuck in a 14 or 15
 solution loop is another issue - that may be a bug unsolvable by CF but
 rather needs to be addressed by Google. Another option, as I mentioned
 above, might be to stop a user before ever reaching a website that is
 going to ask them to run javascript and connect them between two very
 large end points (CF and Google).

 Does Google any end user connections for those captcha requests? If so -
 it seems like the total set of users for CF would be seen by both Google
 and CF, meaning that data on all Cloudflare users prompted for the captcha
 would be available to Google. Is that incorrect?

