Google and Tor.

Mike Perry mikeperry at fscked.org
Thu Aug 26 06:11:46 UTC 2010


Thus spake Robert Ransom (rransom.8774 at gmail.com):

> On Wed, 25 Aug 2010 20:04:01 -0700
> Mike Perry <mikeperry at fscked.org> wrote:
> 
> > I also question Google's threat model on this feature. Sure, they want
> > to stop people from programmatically re-selling Google results without
> > an API key in general, but there is A) no way people will be reselling
> > Tor-level latency results, B) no way they can really expect determined
> > competitors not to do competitive analysis of results using private IP
> > ranges large enough to avoid DoS detection, C) no way that the total
> > computational cost of the queries coming from Tor can justify denying
> > so many users easy access to their site.
> 
> If Tor exit nodes were allowed to bypass Google's CAPTCHA, someone
> could put up a low-bandwidth Tor exit node and then send their own
> automated queries directly to Google from their Tor exit's IP.

Good point. However I wasn't advocating whitelisting Tor exits, I was
advocating more intelligent treatment of all high user-count IP
addresses, and better mechanisms of rate limiting in general. It's my
understanding that a lot of NATed users also run into these captchas
during search.

To reduce scraping by suspect IPs, their servers could perform all
sorts of browser tests to ensure that there is a full working DOM
supported by javascript, which can be computationally costly to deploy
by scrapers.  They can also serve javascript code that performs
semi-large integer factorization in the background and post the
factors back with queries to rate limit scrapers computationally, or
at least tip the cost ratios more in favor of just paying for an API
key. 

Perhaps more effective, they could use various metrics to indirectly
estimate the number of humans behind an IP. There are plenty of Google
services and applications they provide that aren't really usable by
bots. The rate of use of these non-search services per IP should
provide a strong indicator of human activity behind that IP.

Again, the impression I got was that if they had done the analysis on
the captcha solve rate vs the query rate per IP, the cost/benefit
analysis of the DoS mechanisms they apply, or the cost vs
effectiveness vs user impact of alternatives, they certainly weren't
willing to discuss any of this with us. They also seemed disinclined to
meet to explore any realistic alternatives we could jointly develop in
both Torbutton and the DoS side to help reduce the captchas and 403s
experienced by our users.


-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-talk/attachments/20100825/1b5cf3c9/attachment.pgp>


More information about the tor-talk mailing list