[tor-talk] Hidden Services (about tor2web)

Fabio Pietrosanti (naif) lists at infosecurity.ch
Wed Sep 19 21:41:46 UTC 2012


Apologise for subject/thread hijacking.

On 9/19/12 10:13 AM, tor at lists.grepular.com wrote:
> On 19/09/12 06:36, grarpamp wrote:
>
> >> People use robots.txt to indicate that they don't want their site
> >> to be added to indexes.
>
> > They use it to indicate that they don't want their site to be
> > crawled.
>
> In almost all cases (99% or higher), robots.txt is used to indicate
> that a site shouldn't be crawled, *because* they don't want it to be
> indexed. The intention is painfully clear...

The point has been integrated in the appropriate ticket there:
https://github.com/globaleaks/Tor2web-3.0/issues/19

Please integrate here any idea or suggestion about the topic.

However you should also know that already today is possible for a TorHs
to block access from Tor2web.

Tor2web send an X-Tor2web header to announce to the TorHS that
connection come from Tor2web.

We added up a wiki documentation section explaining how to do it:
https://github.com/globaleaks/Tor2web-3.0/wiki/Blocking-access-from-tor2web

Regarding the topic of "robots.txt", in the new tor2web 3.0 robots.txt
are "hijacked" in order to prevent Tor2web crawling by public search
engine. Also a list of user agent of internet spyder has been blocked by
default.
Both blocks settings can be disabled from config file:
https://github.com/globaleaks/Tor2web-3.0/wiki/Configuring-tor2web

Those blocks will be probably less annoying when the behavior regarding
spidering will be configurable directly from TorHs sites (for example by
providing specific tor2web related config strings in robots.txt).

Fabio

p.s. There's a new tor2web domain using Tor2web 3
http://eqt5g4fuenphqinx.tor2web.blutmagie.de :-)


More information about the tor-talk mailing list