[tor-project] Ethics Guidelines; crawling .onion

Virgil Griffith i at virgil.gr
Mon Jul 25 09:25:09 UTC 2016


> It feels like you're engaging in rules lawyering, trying to find a policy
or statement that will let you do what you want to do.

Apologies---that's unintended.  I simply seek a policy or statement that
clarifies this issue one way or the other.  If the community wants to
explicitly ban .onion search engines, that is their right.  I personally
consider such a ban to be immensely unwise, but I would be satisfied with a
clarification either way.  Right now it's in a funny limbo that seemingly
no one is willing to resolve (aside from yourself. Thanks BTW.)

====================================

> Please engage with people's concerns instead.

I'm happy to calmly discuss people's concerns about onion.link and tor2web
privacy, but I insist on clarifying the relatively easy robots.txt issue
first.  Talking about Virgil-specifics or whether
Virgil-is-a-tolerable-person is currently a distraction.  Because if we
conclude that robots.txt is fully sufficient, and thus .onion content is by
default "public data", then the whether virgil-is-tolerable discussion
changes drastically.  If robots.txt is deemed a sufficient standard, then
it's worth going forward on a longer discussion where I hope to clarify the
judgement calls I've made.

> When I've seen people talk about "crawling .onion sites", the issue that
has
> received the most focus is the harvesting of .onion addresses by running
a
> malicious HSDir. We do things to prevent this behaviour, including
> blacklisting HSDirs. This behaviour is clearly unethical, there is a
> community consensus about it, and we invest resources in preventing it.

Sure.  No complaints here.


> As for accessing .onion sites via an automated process or non-anonymous
proxy
> (e.g. Tor2web), that's something we're still talking about. There are
> significant issues around client anonymity, server anonymity, and access
to
> sensitive data. We might decide we want to actively prevent it. We might
> decide we don't want to put any effort into supporting it in future.



> There's also the issue of searching these sites. Perhaps some kinds of
search
> are ok, but others are too powerful (like regular expressions, which many
> search sites avoid). Again, this is something we're discussing.

This is me imploring, begging, to have that discussion on search engines,
regexes, etc.  I've yet to find any argument for position (A), which as far
as I can tell is the position currently enshrined in the ethics
guidelines.  This is me asking for either an argument for position (A), or
a clarification that robots.txt is fine.

=====
Even though I said I didn't want to get into tor2web until the robots.txt
is largely addressed, I'm going to discuss it briefly just as an olive
branch.

> I don't know if I'd trust you to be in a position where you see client
> requests.
> I'm not sure I'd even trust you to run a Guard node, and Tor2web admins
see
> far more than a Guard node does.

This is interesting.  Because I actually consider a Guard node to have more
private information than a Tor2web node.
I claim two things:

(1) Whereas people use TBB for *things that matter* and have an expectation
of privacy.  I claim that tor2web users are interested in convenience and
have little expectation of privacy.  I see negligible difference between
what onion.link does and what Twitter does when they write URLs to goto t.co
so they can record on the clicks.

To put it another way, I do not consider Tor2web users to be "Tor users".

(2) Using the same logic as (1), I would argue Tor2web sees *less* private
information than a Tor guard node.  A guard node is half of the map to
users who have explicitly said, "I wish my traffic to be unlinkable".
Violating this would obviously be an "attack on Tor users".  Offerring logs
for a guard node would be zomg a violation of expectation of privacy and a
damage to the network.  I am 110% on board here.  I wholly support banning
anyone from the community who sells logs from TBB users.

-----

As an aside:

> You might want to enable automatic redirects from http://onion.link to
> https://onion.link.

Already do it.  I also recently enabled DNSSEC because some european ISPs
were doing DNS poisoning and I wanted to stop them from doing that.


> Normally I'd be concerned you use Google Analytics rather than a local
> analytics solution.

I've removed the Google Analytics.  It'll go out in the next weekly release.

===========

The other issues you cited are worth discussing, and I welcome having
them.  But I want to resolve the comparatively easy robots.txt discussion
first.  I was asked to wait a month, and I did so.  Can now we have that
discussion?  Or does it have to postpone another month?  To kickstart the
discussion, I gave the three vidws I've heard:

> (A) isis et al: robots.txt is insufficient
> --- "Consent is not the absence of saying 'no' — it is explicitly saying
'yes'."
>
> (B) onionlink/ahmia/notevil/grams: we respect robots.txt
> --- "Default is yes, but you can always opt-out."
>
> (C) onionstats/memex: we ignore robots.txt
> --- "Don't care even if you opt-out." (see
https://onionscan.org/reports/may2016.html)
>
>
> Isis did a good job arguing for (A) by claiming that representing (B) and
(C) are "blatant and disgusting workaround[s] to the trust and expectations
which onion service operators place in the network."
https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html
>
> This is me arguing for (B):
https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html
>
> I have no link arguing for (C).

I am imploring for there to be discussion arguing (A), (B), (C), or (D)
other.  Thus far we've gotten an argument for (A) from Isis and an argument
for (B) from Juha.`

-V
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20160725/06844d0b/attachment.html>


More information about the tor-project mailing list