[tor-project] Ethics Guidelines; crawling .onion

Virgil Griffith i at virgil.gr
Tue Jun 7 17:34:59 UTC 2016


Hello all.

I wrote on this topic earlier at:

https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

This is me again asking for clarification.  I choose this issue because it
is the most self-contained of the various ones raised by isis et al, and it
seemed wise to clarify this becoming opening up a new one.  If someone from
Tor management writes me that social reasons prohibit search engines from
being addressed at this time, I will drop it.

Given the lack of prior reaction as well as ahmia.fi getting funded for
GSoC (ahmia has followed /robots.txt from day zero), I tentatively conclude
this crawling .onion is non-controversial, i.e., "Per Tor community
standards, search engines obeying robots.txt are a-okay.  Equivalently,
indexing .onion content is treated equivalently as any other part of the
web."

But, to motivate as well as give any concerned parties an opportunity to be
hard, I have republished the onion2bitcoin as well as the bitcoin2onion
anonymizing only the final 4 characters of the .onion address instead of
final 8.

-- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html
-- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html

-V

On Tue, May 31, 2016 at 10:05 PM, Virgil Griffith <i at virgil.gr> wrote:

> This seems like something people would have opinions on.  Anyone?
>
> -V
>
>
> On Monday, 30 May 2016, Virgil Griffith <i at virgil.gr> wrote:
>
>> Hello all.
>>
>> I am preparing a longer response to the issues Isis et al mentioned.
>> Most are interrelated, but this one is not.  And I wanted to get
>> clarification on it.
>>
>> Isis expressed a concern about making a list of bitcoin addresses from
>> .onion, citing, "Consent is not the absence of saying 'no' — it is
>> explicitly saying 'yes'."
>>
>> For what it's worth, ahmia.fi actually supports regex searching right
>> out of the box.  In fact, a single line of JSON spits out all known bitcoin
>> addresses ahmia knows about.
>>
>> For example, here's an anonymized list going .onion -> BTC which I mined
>> from Ahmia,
>> * http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html
>>  [6MB]
>>
>> And here's the same information going BTC -> .onion
>> * http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt [2mb]
>>
>> If you want to check the results you can ask Juha for the JSON query to
>> do this.
>>
>> Lets go out on a limb and assume that regexs are okay.  Is the issue then
>> .onion search-engines?  I understand Isis's preference for there to always
>> be affirmative consent but does that mean that until such a standard exists
>> all search engines from onion.link, ahmia.fi, MEMEX, NotEvil, and Grams
>> are violating official Tor community policy?
>>
>> ----
>> Here's how I currently see this.  I put on my amateur legal hat and say,
>> "Well, the Internet/world-wide-web is considered a public space.
>> Onion-sites are like the web, but with masked speakers."
>>
>> *
>> https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as.public.space.pdf
>> * http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/
>>
>> Ergo, I would argue that, by default, content on .onion is public the
>> same way everything else on the web is.  If you don't want to be "indexed",
>> for physical spaces you go in-doors, or for the web you put up a login.  As
>> an aside, the web-standard is actually *kinder* than physical public spaces
>> because on the web one can have an unobstrusive /robots.txt saying, "please
>> don't index me".  Which is a great thing.
>>
>> Whereas some would say Tor users are "anonymous", others would instead
>> say any and everything Tor is "private".  I believe this needs to be
>> clarified.  I once proposed to Roger that he delineate the sub-types of
>> privacy in the same way Stallman delineated his "Four Freedoms".  Roger
>> replied that he preferred using the broad catch-all term "Privacy".  These
>> confusions may be a caveat of using a broad catch-all term.  Interpreting
>> broadly, Isis is correct.  However, this conclusion has a lot of unpleasant
>> ramifications.
>>
>> Comments appreciated,
>> -V
>>
>>
>> P.S. Mildly related, I saw this today involving DARPA, and Tor.
>> http://thehackernews.com/2016/05/darpa-trace-hacker.html
>>
>> """
>> The aim of Enhanced Attribution program is to track personas continuously
>> and create “algorithms for developing predictive behavioral profiles.”
>> """
>>
>> I hope you all are aware this flows directly from MEMEX.  Right?  This,
>> and MEMEX, seems a much more appropriate target for outrage.  A lot of this
>> work that numerous community members have worked on gives even me pause.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20160608/9b7be58e/attachment.html>


More information about the tor-project mailing list