[tor-dev] GSOC14 Idea

Zack Weinberg zackw at panix.com
Thu Feb 27 17:15:47 UTC 2014


On 02/27/2014 03:14 AM, Roger Dingledine wrote:
> On Sun, Feb 23, 2014 at 05:38:23PM +0530, Devang Thakkar wrote:
>>       Its Devang here, a coding enthusiast studying at IIT Bombay. I am
>> looking forward to contribute to Tor for the upcoming Google Summer of Code
>> 2014 as a prospective student. So I wanted to know if there was a provision
>> for Web Scraping using Tor. If there is, I would to know more about it or
>> if there isn't, is it a feasible Summer of Code project?
> 
> Web scraping using Tor is usually regarded as a bad thing -- first
> because it loads down the Tor network much more than normal browsing,
> and second because it makes destination websites more likely to get angry
> with Tor. For example, when Bing starts scraping Google over Tor in order
> to improve their search results, Google responds by making it harder to
> crawl Google over Tor, which impacts normal Tor users reaching Google too.
> 
> So I think we'd be happy to have a project on how to make website scraping
> through Tor less damaging to destinations and thus to users, but I think
> we're unlikely to find a "make it easier to scrape websites through Tor"
> project exciting.

Inconveniently enough, scraping websites (and hidden services) over Tor
is exactly what a lot of the CMU Tor-related research involves.  We have
developed a few in-house tools for it (none of which are anywhere close
to turnkey).  We haven't put any serious thought into making it "less
damaging to destinations," but I think we would be interested in helping
with a project along those lines.  Offhand I dunno if there's so much
code as best practices documentation needed, though (what's an
appropriate level of rate limiting, you really ought to run a private
entry node, that sort of thing...)

zw

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 880 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20140227/6e125ce5/attachment.sig>


More information about the tor-dev mailing list