[tor-project] Ethics Guidelines; crawling .onion

Virgil Griffith i at virgil.gr
Mon May 30 08:38:29 UTC 2016

Hello all.

I am preparing a longer response to the issues Isis et al mentioned.  Most
are interrelated, but this one is not.  And I wanted to get clarification
on it.

Isis expressed a concern about making a list of bitcoin addresses from
.onion, citing, "Consent is not the absence of saying 'no' — it is
explicitly saying 'yes'."

For what it's worth, ahmia.fi actually supports regex searching right out
of the box.  In fact, a single line of JSON spits out all known bitcoin
addresses ahmia knows about.

For example, here's an anonymized list going .onion -> BTC which I mined
from Ahmia,
* http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html  [6MB]

And here's the same information going BTC -> .onion
* http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt [2mb]

If you want to check the results you can ask Juha for the JSON query to do

Lets go out on a limb and assume that regexs are okay.  Is the issue then
.onion search-engines?  I understand Isis's preference for there to always
be affirmative consent but does that mean that until such a standard exists
all search engines from onion.link, ahmia.fi, MEMEX, NotEvil, and Grams are
violating official Tor community policy?

Here's how I currently see this.  I put on my amateur legal hat and say,
"Well, the Internet/world-wide-web is considered a public space.
Onion-sites are like the web, but with masked speakers."

* http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/

Ergo, I would argue that, by default, content on .onion is public the same
way everything else on the web is.  If you don't want to be "indexed", for
physical spaces you go in-doors, or for the web you put up a login.  As an
aside, the web-standard is actually *kinder* than physical public spaces
because on the web one can have an unobstrusive /robots.txt saying, "please
don't index me".  Which is a great thing.

Whereas some would say Tor users are "anonymous", others would instead say
any and everything Tor is "private".  I believe this needs to be
clarified.  I once proposed to Roger that he delineate the sub-types of
privacy in the same way Stallman delineated his "Four Freedoms".  Roger
replied that he preferred using the broad catch-all term "Privacy".  These
confusions may be a caveat of using a broad catch-all term.  Interpreting
broadly, Isis is correct.  However, this conclusion has a lot of unpleasant

Comments appreciated,

P.S. Mildly related, I saw this today involving DARPA, and Tor.

The aim of Enhanced Attribution program is to track personas continuously
and create “algorithms for developing predictive behavioral profiles.”

I hope you all are aware this flows directly from MEMEX.  Right?  This, and
MEMEX, seems a much more appropriate target for outrage.  A lot of this
work that numerous community members have worked on gives even me pause.
