[tor-project] Ethics Guidelines; crawling .onion
Tim Wilson-Brown - teor
teor2345 at gmail.com
Thu Jul 7 04:54:44 UTC 2016
> On 7 Jul 2016, at 14:40, Virgil Griffith <i at virgil.gr> wrote:
> Hello all. Back in June Griffin asked for this conversation to be temporarily tabled, and it's been a month!
> Let us discuss robots.txt and crawling of .onion. Right now we have *three* camps! They are:
Please define "crawling of .onion".
I don't know enough about the details of what you're doing to have a strong opinion.
How do you make your list of .onion addresses to crawl?
* by running a HSDir?
* using Tor2web request logs?
* using .onion addresses found via a search engine?
* using .onion addresses found on HTML pages on other .onion sites?
* through some other method?
How do you access and index the web content on those .onion sites?
How often do you access the site?
How many pages deep do you go on the site?
Do you follow links to other .onion sites?
How do you make sure that Tor2web users are anonymised (as possible) when accessing hidden services?
> So now we have *three* different positions among respected members of the Tor community.
> (A) isis et al: robots.txt is insufficient
> --- "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'."
> (B) onionlink/ahmia/notevil/grams: we respect robots.txt
> --- "Default is yes, but you can always opt-out."
> (C) onionstats/memex: we ignore robots.txt
> --- "Don't care even if you opt-out." (see https://onionscan.org/reports/may2016.html)
> Isis did a good job arguing for (A) by claiming that representing (B) and (C) are "blatant and disgusting workaround[s] to the trust and expectations which onion service operators place in the network." https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html
> This is me arguing for (B): https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html
> I have no link arguing for (C).
> I had tried to get this conversation moving before. So to poke this discussion to go forward this time, I have republished the onion2bitcoin as well as the bitcoin2onion anonymizing only the final 4 characters of the .onion address instead of final 8. Under (A), compiling this list is deeply heretical. In the view of either (B) or (C), .onion content is by default public (presumably running regexs is fine), compiling such data is a perfectly fine thing to do.
> -- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html
> -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html
Please stop releasing logs.
It could easily be seen as a provocative act.
And it's not a good way to encourage people to talk to you.
One possible consequence is that individuals or groups decide it's poor behaviour, and therefore refuse to deal with you.
Tim Wilson-Brown (teor)
teor2345 at gmail dot com
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
More information about the tor-project