Hello all.  Back in June Griffin asked for this conversation to be temporarily tabled, and it's been a month!

Let us discuss robots.txt and crawling of .onion.  Right now we have *three* camps!  They are:

So now we have *three* different positions among respected members of the Tor community.

(A) isis et al: robots.txt is insufficient
--- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."

(B) onionlink/ahmia/notevil/grams: we respect robots.txt
--- "Default is yes, but you can always opt-out."

(C) onionstats/memex: we ignore robots.txt
--- "Don't care even if you opt-out." (see https://onionscan.org/reports/may2016.html)


Isis did a good job arguing for (A) by claiming that representing (B) and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

This is me arguing for (B): https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

I have no link arguing for (C).

I had tried to get this conversation moving before.  So to poke this discussion to go forward this time, I have republished the onion2bitcoin as well as the bitcoin2onion anonymizing only the final 4 characters of the .onion address instead of final 8.  Under (A), compiling this list is deeply heretical.  In the view of either (B) or (C), .onion content is by default public (presumably running regexs is fine), compiling such data is a perfectly fine thing to do.  
-- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html
-- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html


Let's discuss!

-V