Thus spake Nick Mathewson (nickm@alum.mit.edu):
On Thu, Oct 11, 2012 at 5:38 AM, Mike Perry mikeperry@torproject.org wrote:
Design Overview
The system will have three parts: an internal hard-coded IP address mapping (127.84.111.114:80), a hard-coded mapaddress to a DNS name (selftest.torproject.org:80), and a DirPortFrontPage-style simple HTTP server that serves an XML document for both addresses.
The use of XML and HTTP here are both reasons for some unhappiness. Both of them pull in a fair amount of complexity that I'd prefer not to need. (Yes, Tor already has a sort of an HTTP implementation, but at least clients aren't currently required to run what amounts to a local HTTP server.)
I seriously wonder whether the benefits of HTTP (easier to access from within a locked-down web browser environment) aren't actually the _defects_ of HTTP here: it's easier to poke it from a web page.
I understand that your design takes some steps to prevent browser-based attacks on this, but I'm not currently sure how to become sure that that it solves them all. Right now, I'm nervous.
This is a reasonable fear. I think the major risk with the proposal revolve around the need to prevent the nonces from being used as tracking beacon...
I did my best to protect against this, but we probably could use a few web-heads reviewing it, too.
Upon receipt of a request to the IP address mapping, the system will create a new 128 bit randomly generated nonce and provide it in the XML document.
Requests to http://selftest.torproject.org/ must include a valid, recent nonce as the GET url path. Upon receipt of a valid nonce, it is removed from the list of valid nonces. Nonces are only valid for 60 seconds or until SIGNAL NEWNYM, which ever comes first.
So, I'm not totally sure what the nonce field is for. The idea as I understand it is that when you connect to the IPv4 address, you get a nonce, and later when you connect to the hostname, you provide that nonce, and Tor tells you "yes" if you gave it the same nonce.
What does that protect against? My first thought is that you're trying to prevent the case where a malicious local DNS server maps "selftest.torproject.org" to some IP address in their control, and then just runs a server at that IP address to say "yes I'm Tor". But that doesn't make sense, since you could just make one of those that said "yes I'm Tor" no matter what you say for the nonce.
*Headdesk*. Doh. Yes, the DNS test needs to be given a transform of the nonce (SHA1? SHA1+salt?), and needs to spit the original back out again in the response for validation by the client.
But yes, that is exactly what we're trying to protect against.
Also, how useful is the followup DNS check? If it's checking that DNS leaks aren't happening... You're going to need torbrowser or something of equivalent complexity for this to work at all; isn't it easier then for torbrowser to make sure that it set up SOCKS ?
Hrmm. I was under the impression most apps have url fetch capabilities. Pidgin appears to. Thunderbird definitely does. Both have XML deps already (as does any XMPP chat app).
But yes, the plan was for this to be used by custom software we wrote.
The list of pending nonces should not be allowed to grow beyond 10 entries.
This means that any webpage could flush out the list of pending nonces. Does that matter?
Hrmm. Maybe. I was balancing this with other issues:
1. Without any limit, web pages could oom the tor client.
2. A website that managed to access this service could track a user for a long period of time by getting a pile of nonces to use, all known to be bound to that user.
We could rely only on a shorter default timeout instead, though.
The timeout period and nonce limit should be configurable in torrc.
Design: XML document format for http://127.84.111.114
[...]
Security Considerations
XML was chosen over JSON due to the risks of the identifier leaking in a way that could enable websites to track the user[1].
Well, that's a nuclear-powered-flyswatter!
If I read that page right, the problem with using JSON is that it can be parsed and executed as Javascript, and the advantage of XML is that it's unlikely to be syntactically correct javascript, then maybe instead we should
Assuming "write our own format." finishes this paragraph.
If that's the issue, I'd strongly suggest that instead of going with a more complex data format, we could add a layer of encoding over the json, or use an even simpler format.
I wanted to avoid requiring our clients write parsers, and everything I could think of already parses XML.
But if you think hand-parsing is less dangerous than relying on an XML lib, we can do line-based key=value instead.
Because there are many exceptions and circumvention techniques to the same-origin policy, we have also opted for strict controls on dns-nonce lifetimes and usage, as well as validation of the Host header and SOCKS4A request hostnames.
Of course, this all comes down to the fact that we're using http. Can we spell out why we need HTTP for this?
See https://trac.torproject.org/projects/tor/ticket/6546#comment:18 and the following comment.
Do you want that in the proposal, you mean?