[tor-relays] Network Scan through Tor Exit Node (Port 80)

Sun Apr 3 03:08:16 UTC 2011

On 03/29/2011 02:10 AM, Scott Bennett wrote:
>      On Thu, 10 Mar 2011 10:27:50 -0800 Chris Palmer <chris at eff.org> wrote:
>> On 03/10/2011 09:10 AM, mick wrote:
>>
>>> Using Tor to scan the internet is a good way to see how the internet
>>> looks from different perspectives at once, which can be quite valuable."
>>>
>>> Which says to me that you are using Tor to do this research.
>>
>> No, it says to you that using Tor to scan the internet is a good way to
>> see how the internet looks from different perspectives at once, which
>> can be quite valuable.
>>
>>> So which is it? 
>>
>> The Observatory work was not done through Tor.
> 
>      Good.

I think we need a scan of the SSLiverse through Tor.

>>
>> However, using Tor to scan the internet is a good way to see how the
>> internet looks from different perspectives at once, which can be quite
>> valuable, so I vociferously defend the idea of doing so.
>>
>      Ah.  "The ends justify the means."  How enlightened. :-(
> 

I don't think Chris is making "The ends justify the means" as his
ethical model. Tor relays with exit policies that allow exiting to *:443
intend to allow exiting to *:443. You have to go to quite a bit of
effort to become a relay, so I'll trust that this counts as consent for
exiting the Tor network on port 443. I hope we don't disagree about this?

Now, if your ethical issue surrounds the fact that they then connect to
a computer without asking permission, I'd make the argument that this is
reasonable. Even if I was mistaken in your desire to have me connect to
your system, I believe your placement of a computer system on the public
internet requires handling a little bit of expectation setting.

What do I mean? I mean to say - if you configure PF to block me, I'm
still burning some amount of CPU time on your machine. Is that an
unethical action on my part? To ask your computer a question and for
your computer to reply (with say, a RST) is a normal part of the
networking protocols. The internet is inherently chatty and some amount
of that chatter is the cost of connecting to the public internet.
Obviously 1,000,000 connections at once isn't polite but is polite
really zero connections? Is one connection really so impolite or unethical?

Now - if we assume that it's reasonable to send a single SYN and then
complete the handshake when the policy allows, what is the next
boundary? I'd argue that common HTTPS ports probably run HTTPS software
- to better understand that software, you'll need to negotiate some or
all of that protocol. Is sending a TLS ClientHello a reasonable and
ethical next step? I'd say so. It also seems to make sense that when the
server replies, you might log the ServerHello. You might even log all of
the data that the server intended you to have. Is that impolite or
unethical? Is there something wrong with what has been done by this
point in the protocol? I don't think so.

Now the client will reasonably tear down the connection as described in
the relevant protocols. Is that wrong? I don't think so - the protocols
specifically indicate how systems should signal their intentions. You're
free to tell me to stop connecting and I'm free to connect - that's how
these things generally work.

Now - as it happens, the EFF SSL observatory client does not actually
implement the entire set of SSL/TLS protocols - just as some software
does not implement TLS 1.1 or 1.2 - is it somehow wrong to run a client
that isn't completely implemented for all specs that it might encounter?
That seems doubtful.

Google seems to have this data from crawling the web and simply caching
it as a matter of crawling everything - they get the data from lots of
sources such as other urls, toolbars, etc. Google recently published
the Google Certificate Catalog:
http://googleonlinesecurity.blogspot.com/2011/04/improving-ssl-certificate-security.html

So is Google's method the only ethical way to collect this certificate
data? Or is there no method for collecting this data without users
manually submitting each certificate they encounter by hand?

Even if we pretend that the EFF or the Google methods weren't to be used
- do you think that the EFF model is actually burning more CPU time on
each system?

Do you accept that there is some amount of CPU time you're going to have
to burn when you connect to the internet as a server? If so, what's the
limit or the edge of reasonable CPU time that a single client may cause
for a public server?

In any case, the idea of using Tor for perspective routing is not
particularly new -  Geoffrey Goodell's work on the topic is well over
half a decade old at this point. It makes a lot of sense to use Tor as a
perspective-routing system of sorts and there's nothing wrong with that
at all.

All the best,
Jacob