> you might want to remove the client IP address (X-Forwarded-For) from HTTP headers
Agreed! And yes we already remove x-forwarded-for.
I recall that the very, very beginning we had a python proxy library automatically adding x-forwarded-for, but once we realized it was doing that we corrected it. FWIW, it was actually Aaron who wrote that code ;)
AFAIK Tor2web hasn't leaked any privacy-invading headers for sometime. If ones are discovered they would be fixed ASAP.
> Is the opt-out permanent, or does your server re-check every time it connects?
> I can imagine there being issues with either model - one involves storing a list, the other, regular connections.
I don't know. This is Google/Bing's department. Do we have someone on list familiar enough with either? If I were to guess the Googley/Bingy-way of doing this, I'd imagine them storing the list, and then when crawling the site again they'd do a HEAD request to see if the /robots.txt has changed. And if the /robots.txt has changed, to overwrite their stored list.
> I am disappointed that we have a Tor2web design where Tor2web needs to connect to a hidden service first, then check if it has given permission for Tor2web to connect to it.
/robots.txt isn't a permission to "connect to", it's a permission to crawl/index. I'm aware of no standard within or outside of Tor to say whether node A has permission to connect to node B. If such a standard or even unofficial exists I'm down for spending some weekends implementing it.
> I am also disappointed that this only works for HTTP onions on the default port 80.
I agree completely. But if the issue is operator privacy, isn't it even *better* that tor2web only works for port 80? As an aside, there is tor2tcp at:
https://cryptoparty.at/tor2tcp
> I am also concerned about threat models where a single unwanted connection, or a number of unwanted connections, are security factors.
> For example:
> Imagine there is an (unknown) attack which can determine 1 bit of the 1024-bit RSA key per hidden service connection.
> (Some known attacks on broken crypto systems are like this, as are some side-channels.)
> Or imagine there is an attack which can determine 1 bit of the IPv4 address per connection.
> Is there an alternative to position (A) that supports threat models like this?
I don't have a good solution to this. As stated above, I'm aware of no protocol for saying "Please don't connect to me." The security person in me is a little skeptical how useful it would be---if someone wanted to make many connections to learn a private key, I presume she won't be obeying said requests. However, if someone doesn't want to be connected to, upon such a standard existing I would happily abide by it.
> there is also the possibility of exerting social pressure to prevent people from running servers that continually connect to tor hidden services.
The closest things I know of for social pressure are:
(1) Liberal caching headers in the HTTP response:
```
max-age=604800 #can be cached by browser and any intermediary caches for up to 1 week
```
(2) In /robots.txt putting long crawl-delays:
```
User-Agent: *
Crawl-delay: 86400 #wait 1 day between each fetch.
```
> I believe that a technical solution to this threat model is hidden service client authentication (and the next-generation hidden service protocol, when available).
Agreed.
-V