[tor-project] Ethics Guidelines; crawling .onion

Virgil Griffith i at virgil.gr
Wed Jun 8 08:48:39 UTC 2016


Okay.  Can do.  Tabling this for a month.

-V



On Wed, Jun 8, 2016 at 3:55 PM, Griffin Boyce <griffin at cryptolab.net> wrote:

> Hey Virgil,
>
>   While I know you and I have talked about this in private recently, it
> seems like a good time to table this discussion for a couple of weeks.
> Considering everything else that's going on, this might not be the ideal
> time for everyone to contribute to the discussion.
>
> <3
> Griffin
>
>
>
>
> Virgil Griffith wrote:
>
>> Here's yet another data point indicating the policy on crawling .onion
>> needs to be clarified.  The new and popular OnionStats tool doesn't
>> even respect /robots.txt, see:
>> https://onionscan.org/reports/may2016.html
>>
>> So now we have *three* different positions among respected members of
>> the Tor community.
>>
>> (1) isis et al: robots.txt is insufficient
>> --- "Consent is not the absence of saying 'no' — it is explicitly
>> saying 'yes'."
>>
>> (2) onionlink/ahmia/notevil/grams: we respect robots.txt
>> --- "Default is yes, but you can always opt-out."
>>
>> (3) onionstats/memex: we ignore robots.txt
>> --- "Don't care even if you opt-out."
>>
>> -V
>>
>> On Wed, Jun 8, 2016 at 1:34 AM, Virgil Griffith <i at virgil.gr> wrote:
>>
>> Hello all.
>>>
>>> I wrote on this topic earlier at:
>>>
>>>
>>> https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html
>>
>>>
>>> This is me again asking for clarification.  I choose this issue
>>> because it is the most self-contained of the various ones raised by
>>> isis et al, and it seemed wise to clarify this becoming opening up a
>>> new one.  If someone from Tor management writes me that social
>>> reasons prohibit search engines from being addressed at this time, I
>>> will drop it.
>>>
>>> Given the lack of prior reaction as well as ahmia.fi [1] getting
>>> funded for GSoC (ahmia has followed /robots.txt from day zero), I
>>> tentatively conclude this crawling .onion is non-controversial,
>>> i.e., "Per Tor community standards, search engines obeying
>>> robots.txt are a-okay.  Equivalently, indexing .onion content is
>>> treated equivalently as any other part of the web."
>>>
>>> But, to motivate as well as give any concerned parties an
>>> opportunity to be hard, I have republished the onion2bitcoin as well
>>> as the bitcoin2onion anonymizing only the final 4 characters of the
>>> .onion address instead of final 8.
>>>
>>> -- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html
>>> -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html
>>>
>>> -V
>>>
>>> On Tue, May 31, 2016 at 10:05 PM, Virgil Griffith <i at virgil.gr>
>>> wrote:
>>> This seems like something people would have opinions on.  Anyone?
>>>
>>> -V
>>>
>>> On Monday, 30 May 2016, Virgil Griffith <i at virgil.gr> wrote:
>>>
>>> Hello all.
>>>
>>> I am preparing a longer response to the issues Isis et al mentioned.
>>> Most are interrelated, but this one is not.  And I wanted to get
>>> clarification on it.
>>>
>>> Isis expressed a concern about making a list of bitcoin addresses
>>> from .onion, citing, "Consent is not the absence of saying 'no' —
>>> it is explicitly saying 'yes'."
>>>
>>> For what it's worth, ahmia.fi [1] actually supports regex searching
>>> right out of the box.  In fact, a single line of JSON spits out all
>>> known bitcoin addresses ahmia knows about.
>>>
>>> For example, here's an anonymized list going .onion -> BTC which I
>>> mined from Ahmia,
>>> * http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html
>>> [6MB]
>>>
>>> And here's the same information going BTC -> .onion
>>> * http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt
>>> [2mb]
>>>
>>> If you want to check the results you can ask Juha for the JSON query
>>> to do this.
>>>
>>> Lets go out on a limb and assume that regexs are okay.  Is the issue
>>> then .onion search-engines?  I understand Isis's preference for
>>> there to always be affirmative consent but does that mean that until
>>> such a standard exists all search engines from onion.link, ahmia.fi
>>> [1], MEMEX, NotEvil, and Grams are violating official Tor community
>>>
>>> policy?
>>>
>>> ----
>>> Here's how I currently see this.  I put on my amateur legal hat and
>>> say, "Well, the Internet/world-wide-web is considered a public
>>> space.  Onion-sites are like the web, but with masked speakers."
>>>
>>> *
>>>
>>>
>> https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as.public.space.pdf
>>
>>> * http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/
>>>
>>> Ergo, I would argue that, by default, content on .onion is public
>>> the same way everything else on the web is.  If you don't want to be
>>> "indexed", for physical spaces you go in-doors, or for the web you
>>> put up a login.  As an aside, the web-standard is actually *kinder*
>>> than physical public spaces because on the web one can have an
>>> unobstrusive /robots.txt saying, "please don't index me".  Which is
>>> a great thing.
>>>
>>> Whereas some would say Tor users are "anonymous", others would
>>> instead say any and everything Tor is "private".  I believe this
>>> needs to be clarified.  I once proposed to Roger that he delineate
>>> the sub-types of privacy in the same way Stallman delineated his
>>> "Four Freedoms".  Roger replied that he preferred using the broad
>>> catch-all term "Privacy".  These confusions may be a caveat of using
>>> a broad catch-all term.  Interpreting broadly, Isis is correct.
>>> However, this conclusion has a lot of unpleasant ramifications.
>>>
>>> Comments appreciated,
>>> -V
>>>
>>> P.S. Mildly related, I saw this today involving DARPA, and Tor.
>>> http://thehackernews.com/2016/05/darpa-trace-hacker.html
>>>
>>> """
>>> The aim of Enhanced Attribution program is to track personas
>>> continuously and create “algorithms for developing predictive
>>> behavioral profiles.”
>>> """
>>>
>>> I hope you all are aware this flows directly from MEMEX.  Right?
>>> This, and MEMEX, seems a much more appropriate target for outrage.
>>> A lot of this work that numerous community members have worked on
>>> gives even me pause.
>>>
>>
>>
>>
>> Links:
>> ------
>> [1] http://ahmia.fi
>>
>> _______________________________________________
>> tor-project mailing list
>> tor-project at lists.torproject.org
>> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project
>>
>
> --
> There are 10 kinds of people in the world: those who understand binary,
> those who don't, and people who didn't expect a base 3 joke.
> _______________________________________________
> tor-project mailing list
> tor-project at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20160608/2be5e3b7/attachment-0001.html>


More information about the tor-project mailing list