Wowsa. Hello everyone. Several issues brought up. Let us go one at a time. I apologize for the length. Forgive any typos.
Tor2web similarly should be killed with fire as being a blatant and disgusting workaround to the trust and expectations which onion service operators place in the network.
(1) When Aaron Swartz and I created Tor2web, we envisioned an anonymous publishing platform for world wide web. The quip was, "What good is an anonymously printed book if only your most technical, paranoid, patient friends can read it?" Our goal was for whistleblowers/etc to publish safely behind Tor and readers could link to and share the anonymously-published content over services as mundane as Facebook. Ergo Tor2web was born. Moving to the present, Shari explicitly prioritized make Tor usage more "mainstream". As Tor2web is many people's first exposure to onionsites and Tor, burning Tor2web would be counter-productive to Shari's stated goals as I currently understand them.
said onion service simply because it didn't "opt out" of your historically malicious desires to harvest data on Tor users and operators. Consent is not
(2) If a user requests a page from a never-before-seen .onion domain, and the response is HTTP 200, and that .onion domain doesn't have a disallow in its /robots.txt, the root of that domain is appended to https://onion.link/sitemap.xml for public search engines to crawl. I think it's a bit silly to forbid this, but after seeing this reaction I removed the sitemap until there's a policy on this.
Simply because a user, given an onion service address, naïvely decides to use one of your Tor2Web nodes, it is unacceptable that your Tor2Web node crawls said onion service simply because it didn't "opt out" of your historically malicious desires to harvest data on Tor users and operators. Consent is not the absence of saying "no" — it is explicitly saying "yes".
(3) I understand Isis's concern about search engines being opt-out, i.e., "Consent is not the absence of saying 'no' — it is explicitly saying 'yes'." When it come to sexual consent I wholly support this standard, and this same point was mentioned during the 90s in the creation of robots.txt. Without taking any side on robots.txt, the winning argument back then was roughly, "Search engines are very useful. We understand people like privacy, so we want a way for them to exclude themselves and make incredibly taboo to violate this exclusion. However, search engines are useful, so useful that it is worth making opt-in be the default." Isis disagrees with this precedent, and there exist others who support it. I support the community coming to a consensus on this issue and if it's widely agreed that previous robots.txt precedent was a mistake, I am down for adjusting.
FWIW, Aaron Swartz was the one who chose the somewhat-odd subdomain structure of tor2web URLs. He chose this structure for the *explicit purpose* of making /robots.txt "just work". So we can put Aaron down in the column for "supports the robots.txt precedent". I find it peculiar that the position of the person to whom Tor 0.2.4.x was dedicated, on one of his signature projects, is considered so out-of-the-norm to attract an analogy to rape.
Perhaps, more explicitly, what we'd like to eliminate is people like you, Virgil. You've admitted publicly, in person, to several of our developers that you harvested HSDir data and then further attempted (unsuccessfully) to sell said data on users to INTERPOL and the Singaporean government.
(4) There is substantial confusion on this. Let us clear the air.
(4.1) For me, Tor's speed and sustainable growth are front-and-center. For example, I wrote a Tor tech report on exactly this topic.
https://research.torproject.org/techreports/tor-growth-2014-10-04.pdf
We all know that .onion sites routinely disappear, and OnionLink has a lot of users who click repeatedly attempting to access long-gone .onion domains. I wanted two things: (a) tell users when a .onion domain no longer exists (so they'll stop refreshing); (b) given the substantial traffic OnionLink generates, minimize the burden we place on HSDirs. To achieve this, whenever there was an error, we used Donnache's python script to see whether the .onion domain existed in the DHT. If the domain didn't exist ("NXDOMAIN"), we cached that answer so we didn't burden the HSDirs with duplicate lookups for nonexistent domains. I felt, and feel, doing this was being a courteous citizen and the right thing to do, but my attempt at courteous behavior generated so much vitriol that OnionLink no longer caches non-existent domains, and correspondingly now burdens HSDirs more. I hope one day it will be politically acceptable to cache NXDOMAIN responses so we have a faster, more scalable Tor network.
(4.2) OnionLink is just too popular. As-is, OnionLink processes ~600 hits/sec and is projected to cross 1000 hits/sec before November. This is beyond my modest researcher's budget. And making OnionLink sustainable is an ongoing effort.
First, I tried the Bitcoin donations but no one donated.
Second I tried to make onion.link a paid-service---see our Google Toolbar experiment: https://chrome.google.com/webstore/detail/onionlink-onion-plugin/pgdmopepkim... But under the paid-service the traffic was so low that onion.link wasn't fulfilling its mission of serving the casual audiences Aaron and I intended.
This left me with the choice between displaying ads or selling minimized logs. There's a natural knee-jerk of *logs are bad*, and I thought it too. But after carefully weighing the each option, I felt, and continue to feel, that selling minimized logs is the lesser evil. Here's why:
With ads, which some Tor2web sites use (e.g., http://onion.nu/), the ad-networks gain access to the raw IP#s, which, for the exactly reasons Isis cited, should be zealously guarded. With minimized log files, onion.link greatly mitigates the risk of bad actors acquiring personally-identifying-information.
In my third attempt at sustainability, as Isis also mentioned, the market for logfiles without personally identifying information is exceedingly small---this is unfortunate. Because it forces onion.link into the option we're currently evaluating---ads.
We fought the good fight for greater privacy, but in the fourth attempt at sustainability, we are now begrudgingly experimenting with ads (something like the Forbes "thought of the day".) The leaking of IP addresses to an ad-network makes me uneasy, but when choosing between anonymous-publishing-platform-with-ads vs shutting-down, I choose platform-with-ads. If a market develops for minimized logs, I hope to return to better protecting user privacy by selling minimized logs and preventing ad-networks from seeing raw IP#s.
We do not tolerate people within our community cooperating with any parties, including law enforcement and government agencies, to deanonymise real world users of the Tor network. Full stop.
Wait what!? People believe I conspired with LEAs and governments to de-anonymize Tor users? OH! I thought people were upset that I thought "Fuck the police" was an unwise PR-strategy for mainstreaming Tor (I still think this). Many previously unexplained behaviors suddenly make a lot more sense.
That's a very black brush you got there. Careful whom you paint with that! Jeez.
*still a little be-wildered*
Okay... first reaction... Tor Project members assisting anyone (LEA or otherwise) in deanonymizing users is a palpable conflict of interest. Conflict of interest is terrible for user trust. Additionally, even the appearance of conflict-of-interest damages user-trust. Ergo yes I wholly support this rule. Thumbs up. +1. Anyone conspiring to subvert Tor's security should be banned.
As to Isis's suggestion that I have conspired to or was an accomplice in de-anonymizing Tor users. It is mistaken, against my values Moreover, and moreover lacks any evidence implying otherwise. The closest thing I do to this spurious charge is sell minimized logs (which, ironically, aims to protect user privacy from ad-networks). So here, let us concretize this---I emailed a day's worth of premium onion.link logs [249MB] to tor-assistants@ . I am totally fine going on record saying that this data is less damaging to privacy than Google Adsense or something similar.
Okay... I think that answers your concerns. Anything else?
-Virgil