[tor-project] Ethics Guidelines; crawling .onion

Virgil Griffith i at virgil.gr
Thu Jul 7 06:28:57 UTC 2016


> you might want to remove the client IP address (X-Forwarded-For) from
HTTP headers

Agreed!  And yes we already remove x-forwarded-for.
https://github.com/globaleaks/Tor2web/blob/master/tor2web/t2w.py#L701

I recall that the very, very beginning we had a python proxy library
automatically adding x-forwarded-for, but once we realized it was doing
that we corrected it.  FWIW, it was actually Aaron who wrote that code ;)

AFAIK Tor2web hasn't leaked any privacy-invading headers for sometime.  If
ones are discovered they would be fixed ASAP.


> Is the opt-out permanent, or does your server re-check every time it
connects?
> I can imagine there being issues with either model - one involves storing
a list, the other, regular connections.

I don't know.  This is Google/Bing's department.  Do we have someone on
list familiar enough with either?  If I were to guess the Googley/Bingy-way
of doing this, I'd imagine them storing the list, and then when crawling
the site again they'd do a HEAD request to see if the /robots.txt has
changed.  And if the /robots.txt has changed, to overwrite their stored
list.


> I am disappointed that we have a Tor2web design where Tor2web needs to
connect to a hidden service first, then check if it has given permission
for Tor2web to connect to it.

/robots.txt isn't a permission to "connect to", it's a permission to
crawl/index.  I'm aware of no standard within or outside of Tor to say
whether node A has permission to connect to node B.  If such a standard or
even unofficial exists I'm down for spending some weekends implementing it.

> I am also disappointed that this only works for HTTP onions on the
default port 80.

I agree completely.  But if the issue is operator privacy, isn't it even
*better* that tor2web only works for port 80?  As an aside, there is
tor2tcp at: https://cryptoparty.at/tor2tcp


> I am also concerned about threat models where a single unwanted
connection, or a number of unwanted connections, are security factors.
> For example:
> Imagine there is an (unknown) attack which can determine 1 bit of the
1024-bit RSA key per hidden service connection.
> (Some known attacks on broken crypto systems are like this, as are some
side-channels.)
> Or imagine there is an attack which can determine 1 bit of the IPv4
address per connection.
> Is there an alternative to position (A) that supports threat models like
this?

I don't have a good solution to this.  As stated above, I'm aware of no
protocol for saying "Please don't connect to me."  The security person in
me is a little skeptical how useful it would be---if someone wanted to make
many connections to learn a private key, I presume she won't be obeying
said requests.  However, if someone doesn't want to be connected to, upon
such a standard existing I would happily abide by it.

> there is also the possibility of exerting social pressure to prevent
people from running servers that continually connect to tor hidden services.

The closest things I know of for social pressure are:

(1) Liberal caching headers in the HTTP response:

```
max-age=604800   #can be cached by browser and any intermediary caches for
up to 1 week
```

(2) In /robots.txt putting long crawl-delays:

```
User-Agent: *
Crawl-delay: 86400   #wait 1 day between each fetch.
```

> I believe that a technical solution to this threat model is hidden
service client authentication (and the next-generation hidden service
protocol, when available).

Agreed.

-V

On Thu, Jul 7, 2016 at 1:44 PM, Tim Wilson-Brown - teor <teor2345 at gmail.com>
wrote:

>
> > On 7 Jul 2016, at 15:24, Virgil Griffith <i at virgil.gr> wrote:
> >
> > > How do you make sure that Tor2web users are anonymised (as possible)
> when accessing hidden services?
> >
> > I make a good faith effort not to wantonly reveal personally identifying
> information.  But in short, it's hard.  I urge people to think of tor2web
> nodes as closer to Twitter where they record what links you click.  I
> wholly support having the "where is Tor2web in regards to user privacy"
> discussion (hopefully could even make some improvements to it!), but it is
> orthogonal to the "robots.txt on .onion" discussion.  Let's address the
> robots.txt issue and then we can return to Tor2web user-privacy.
>
> Well, as a separate issue, you might want to remove the client IP address
> (X-Forwarded-For) from HTTP headers your caching proxies send to hidden
> services. And work out if any of the other headers are sensitive.
>
> > On 7 Jul 2016, at 14:40, Virgil Griffith <i at virgil.gr> wrote:
> >
> > So now we have *three* different positions among respected members of
> the Tor community.
> >
> > (A) isis et al: robots.txt is insufficient
> > --- "Consent is not the absence of saying 'no' — it is explicitly saying
> 'yes'."
> >
> > (B) onionlink/ahmia/notevil/grams: we respect robots.txt
> > --- "Default is yes, but you can always opt-out."
>
> Is the opt-out permanent, or does your server re-check every time it
> connects?
> I can imagine there being issues with either model - one involves storing
> a list, the other, regular connections.
>
> > (C) onionstats/memex: we ignore robots.txt
> > --- "Don't care even if you opt-out." (see
> https://onionscan.org/reports/may2016.html)
> >
> >
> > Isis did a good job arguing for (A) by claiming that representing (B)
> and (C) are "blatant and disgusting workaround[s] to the trust and
> expectations which onion service operators place in the network."
> https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html
> >
> > This is me arguing for (B):
> https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html
> >
> > I have no link arguing for (C).
>
> I am disappointed that we have a Tor2web design where Tor2web needs to
> connect to a hidden service first, then check if it has given permission
> for Tor2web to connect to it. I am also disappointed that this only works
> for HTTP onions on the default port 80.
>
> I would like to see a much better design for this.
>
> I am also concerned about threat models where a single unwanted
> connection, or a number of unwanted connections, are security factors.
> For example:
> Imagine there is an (unknown) attack which can determine 1 bit of the
> 1024-bit RSA key per hidden service connection.
> (Some known attacks on broken crypto systems are like this, as are some
> side-channels.)
> Or imagine there is an attack which can determine 1 bit of the IPv4
> address per connection.
>
> For security, a hidden service operator decides to only allow 10
> connections before rolling over their hidden service to a new key and
> server.
>
> There are at least 10 connections to known .onion addresses every week,
> because there are at least 10 Tor2web or memex or onionstats instances on
> the web.
> Therefore, every week, the operator must roll over their hidden service,
> and arrange to notify users of the new address in a secure fashion.
> Alternately, they must keep the address secret, even from the HSDir hash
> ring, which is not possible.
>
> Is there an alternative to position (A) that supports threat models like
> this?
>
> I believe that a technical solution to this threat model is hidden service
> client authentication (and the next-generation hidden service protocol,
> when available).
> However, there is also the possibility of exerting social pressure to
> prevent people from running servers that continually connect to tor hidden
> services.
>
> Tim
>
> Tim Wilson-Brown (teor)
>
> teor2345 at gmail dot com
> PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
> ricochet:ekmygaiu4rzgsk6n
>
>
>
>
>
> Tim
>
> Tim Wilson-Brown (teor)
>
> teor2345 at gmail dot com
> PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
> ricochet:ekmygaiu4rzgsk6n
>
>
>
>
>
> _______________________________________________
> tor-project mailing list
> tor-project at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20160707/aec58c10/attachment-0001.html>


More information about the tor-project mailing list