On Sun, Feb 23, 2014 at 05:38:23PM +0530, Devang Thakkar wrote:
Its Devang here, a coding enthusiast studying at IIT Bombay. I am
looking forward to contribute to Tor for the upcoming Google Summer of Code 2014 as a prospective student. So I wanted to know if there was a provision for Web Scraping using Tor. If there is, I would to know more about it or if there isn't, is it a feasible Summer of Code project?
Hi Devang,
Web scraping using Tor is usually regarded as a bad thing -- first because it loads down the Tor network much more than normal browsing, and second because it makes destination websites more likely to get angry with Tor. For example, when Bing starts scraping Google over Tor in order to improve their search results, Google responds by making it harder to crawl Google over Tor, which impacts normal Tor users reaching Google too.
So I think we'd be happy to have a project on how to make website scraping through Tor less damaging to destinations and thus to users, but I think we're unlikely to find a "make it easier to scrape websites through Tor" project exciting.
--Roger