(This email got way out of hand from a basic 'I'll bounce an idea here', here's to hoping I haven't made some huge oversight.)
I've been thinking about the https frontend after reading the basic problem when I started looking into Tor dev but never took the time to read the actual proposal. When I got some basic idea around how to solve the core problem, I took the time to take a read and it turns out the proposal actually has 90% of what I could think of, but I'm glad I took some time to think from a (I hope) fresh perspective.
So anyway, on point. I think Designs #2&3 are the best ideas for Proposal 203 (probably leaning toward #2 more). They're basically the same concept anyway. I came to the same conclusion that we definitely need a shared key to be distributed per bridge address for this to work in any fashion, ideally these keys could be rotated frequently. I also totally agree with the server being a key implementation detail, ideally we want something drop in that could go alongside an existing website. As for content I think mock corporate login pages are a neat idea, while mock private forums are not.
Regarding authentication and distinguishability, I don't agree with trying to distinguish Tor clients from non-Tor based on anything the client initially sends, as any sort of computation that isn't webserver-y could be a timing attack or otherwise. I have some specific ideas around how we can implement this to address the issues/concerns outlined in the current proposal.
I think the best course of action is to use a webserver's core functionalities to our advantage. I have not made much consideration for client implementation. But here are some thoughts on how we could potentially achieve our goals:
- Shared secrets are shared with users whenever bridge IPs are exchanged, it is necessary for these to be large random values and not user-like passwords. (As one of the Authorize proposals also mentions) This exchange would ideally give a domain name for the bridge so as we're not trying to connect to an IP, but to reduce user error the domain and key should be concatenated and base64d so it's a single copy/paste for the user without them trying to navigate to a url thinking it's a Tor enabled link or something. - The users Tor client (assuming they added the bridge), connects to the server over https(tls) to the root domain. It should also downloads all the resources attached to the main page, emulating a web browser for the initial document. - The server should reply with it's normal root page. This page can be dynamically generated, no requirement for it to be static, the only requirement is that one of the linked documents (css, js, img) be served with a header that allows decent caching (>1hr). The far future file could be the document itself but it doesn't have to be. - (This part is probably way too trashy to the server performance, I'm winging it as I can think of it.) - For all files included in the main document, whichever has the furthest future cache header, we'll call that file F. - If we have precomputed the required values (see below) for F, then we are ok to move to the next step (see next bullet point), otherwise, serve all of the files with cache headers under one hour. - If F doesn't have the pre-computations ready, this is the time to spin off a subprocess to start calculating stuff (at lowish cpu priority probably). - The subprocess should calculate an intensive function (e.g. scrypt) of hash(contents of F xored with the shared key) for (X...Y) iterations, inclusive. X and Y should be chosen so that X is on the magnitude of seconds to compute while Y is a couple of thousand iterations above it. Store a 'map' of numberOfIterations => { result, hmac(result + tls cert identifier) }. The hmac should be keyed with the shared secret. The tls cert identifier should probably be its public key or signature? It should store these results in fast cache (hopefully in memory). - So we have our file F, and a precomputed value Z which was the function applied Y times and has a hmac H. We set a cookie on the client base64("Y || random padding || H") - The server should remember which IPs which were given this Y value. This cookie should pretty much look like any session cookie that comes out of rails, drupal, asp, anyone who's doing cookie sessions correctly. Once the cookie is added to the headers, just serve the document as usual. Essentially this should all be possible in an apache/nginx module as the page content shouldn't matter. - Here's a core idea: the server has a handler setup for each of the Z values, hex encoded is probably best (longer!) e.g. /FFFEF421516AB3B2E42... - The webserver should be setup to accept secure websocket upgrades to these urls and route the connection to the local Tor socket. - If the iteration value for the given url is not the same as the one given to the ip trying the path, or the iteration value doesn't match Y, the connection should be dropped/rejected. (This can be legitimate) - If the connection is accepted, the current Y value should be decremented. If Y < X for the current F then we should rotate our keys. (This is a bit of a question, we could manipulate one of the files, but that interferes with the website and could cause distinguishability). - Basically after Y-X Tor clients (not related to how many https users are served), we should be rotating our keys incase the keys leaked, or changing handlers to stop old handlers being used. - When rotating keys we should be sure to not accept requests on the old handlers, by either removing them(404) or by 403ing them, whatever. The decrementing of Y is to try to make replay attacks less feasible, although that would mean tls was broken if they were able to get the initial value, but fuck, who knows with breach & crime et cetera. - (Best read the rest before reading this part: To reduce key churn, or allow long term guard-like functionality, the old handlers could be saved and remain unique to a single ip, by sending cookies from client to server that are a unique id accepted from that ip, the server could know to use an old shared key or something so the client wouldn't blacklist them. Or the client could know not to blacklist previously successful bridges by remembering their tls cert or something. I haven't really thought much about this, but it's probably manageable.) - The idea here is that the webserver (apache/nginx) is working EXACTLY as a normal webserver should, unless someone hits these exact urls which they should have a negligable chance of doing unless they have the current shared secret. There might be a timing attack here, but in that case we can just add a million other handlers that all lead to a 403? (But either way, if someones spamming thousands of requests then you should be able to ip block, but rotating keys should help reduce the feasability of timing attacks or brute forcing?) - So, how does the client figure out the url to use for wss://? Using the cache headers, the client should be able to determine which file is F. If all files are served with a cache header under one hour, then we wait a time period T. Realistically, if the Tor client knows this is a bridge, the only reason this wait should happen is if precomputings happening, so it should just choose another bridge to use... or wait minutes and notify the user that it's for good reason. - Assuming we get a valid F, we look at our cookies. For all cookies, if they're base64, convert to binary, then, try treating the first K bytes (we should have an upper bound for Y, lets say it's probably an 8 byte unsigned long) as a number I. We replicate the computations that the server would have done to get our Zc(lient). - Using this Zc, and the cert provided by the server, we can compute our local Hc. If Hc doesn't match the last (Length of hmac used) bytes in the cookie then try the next cookie. - If no cookie matches, then we either have an old key or we're being MITMd (the computation was ok but the cert didn't match). In these cases, we should fake some user navigation for a couple of pages then close the connection and blacklist the bridge (run for the hills and don't blow the bridge!). - If we get a match, then we know Zc, so we upgrade the connection to wss://domain/Zc which should be a valid secure websocket connection (usable as tcp) unless another ip was already accepted on this iteration value then the server should reject us. If we get rejected at this stage, we know the server had good reason (trying to stop replays) so we just retry from the start and cross our digital fingers. (If bridges are sufficiently private then this should be a non-issue as it will likely only happen with 2 Tor clients connecting within the same second or so) - At this point there should be an encrypted tcp tunnel between the Tor client and the bridge's apache/nginx, and an unencrypted connection between the webserver and the bridge's Tor socket. Should be able to just talk Tor protocol now and get on with things.
So to summarise,
- Using general web tools to negotiate, secret paths, headers and cookies - Proof of work-ish system using the shared key to establish a unique url - Checking for a MITM and allowing key rotation by using our shared key with a hmac to determine: - If the provided certificate matches what the server thought it gave us - If our result is correct with our current key - Assuming it's sound, I think the serverside could be implemented as an apache module that could be a relatively easy drop in.
Concerns:
- Distinguishability of client https & websockets implementation. - Content for servers - Everything above as I'm sure theres obviously critical flaw I'm overlooking! - Amount of work it would take on client side :(
Rym
On 2013-09-12 09:25 , Kevin Butler wrote:
[generic 203 proposal (and similar http scheme) comments]
- HTTPS requires certificates, self-signed ones can easily be blocked as they are self-signed and thus likely not important. If the certs are all 'similar' (same CA, formatting etc) they can be blocked based on that. Because of cert, you need a hostname too and that gives another possibility of blocking
- exact fingerprints of both client (if going that route) and server cert should be checked. There are too many entities with their own Root CA, thus the chained link cannot be trusted, though should be checked. (generation of a matching fingerprint for each hostname still takes a bit and cannot easily be done quickly at connect-time)
[..]
Regarding authentication and distinguishability, I don't agree with trying to distinguish Tor clients from non-Tor based on anything the client initially sends, as any sort of computation that isn't webserver-y could be a timing attack or otherwise.
Correct.
[..]
I think the best course of action is to use a webserver's core functionalities to our advantage. I have not made much consideration for client implementation.
Client side can likely be done similar to or using some work I am working on which we can hopefully finalize and put out in the open soon.
Server side indeed, a module of sorts is the best way to go, you cannot become a real webserver unless you are one. Still you need to take care of headers set, responses given and response times etc.
But here are some thoughts on how we could potentially achieve our goals:
- Shared secrets are shared with users whenever bridge IPs are exchanged, it is necessary for these to be large random values and not user-like passwords. (As one of the Authorize proposals also mentions) This exchange would ideally give a domain name for the bridge so as we're not trying to connect to an IP, but to reduce user error the domain and key should be concatenated and base64d so it's a single copy/paste for the user without them trying to navigate to a url thinking it's a Tor enabled link or something.
That looks sound indeed.
- The users Tor client (assuming they added the bridge), connects to the server over https(tls) to the root domain. It should also downloads all the resources attached to the main page, emulating a web browser for the initial document.
And that is where the trick lies, you basically would have to ask a real browser to do so as timing, how many items are fetched and how, User-Agent and everything are clear signatures of that browser.
As such, don't ever emulate. The above project would fit this quite well (though we avoid any use of HTTPS due to the cert concerns above).
[..some good stuff..]
- So we have our file F, and a precomputed value Z which was the function applied Y times and has a hmac H. We set a cookie on the client base64("Y || random padding || H") o The server should remember which IPs which were given this Y value.
Due to the way that HTTP/HTTPS works today, limiting/fixing on IP is near impossible. There are lots and lots of people who are sitting behind distributed proxies and/or otherwise changing addresses. (AFTR is getting more widespread too).
Also note that some adversaries can do in-line hijacking of connections, and thus effectively start their own connection from the same IP, or replay the connection etc... as such IP-checking is mostly out...
This cookie should pretty much look like any session cookie that comes out of rails, drupal, asp, anyone who's doing cookie sessions correctly. Once the cookie is added to the headers, just serve the document as usual. Essentially this should all be possible in an apache/nginx module as the page content shouldn't matter.
While you can likely do it as a module, you will likely need to store these details outside due to differences in threading/forking models of apache modules (likely the same for nginx, I did not invest time in making that module for our thing yet, though with an externalized part that is easy to do at one point)
[..]
o When rotating keys we should be sure to not accept requests on the old handlers, by either removing them(404) or by 403ing them, whatever.
Better is to always return the same response but ignore any further processing.
Note that you cannot know about pre-play or re-play attacks. With SSL these become a bit less problematic fortunately. But if MITMd they still exist.
[..]
o The idea here is that the webserver (apache/nginx) is working EXACTLY as a normal webserver should, unless someone hits these exact urls which they should have a negligable chance of doing unless they have the current shared secret. There might be a timing attack here, but in that case we can just add a million other handlers that all lead to a 403? (But either way, if someones spamming thousands of requests then you should be able to ip block, but rotating keys should help reduce the feasability of timing attacks or brute forcing?)
The moment you do a ratelimit you are denying possibly legit clients. The only thing an adversary has to do is create $ratelimit amount of requests, presto.
- So, how does the client figure out the url to use for wss://? Using the cache headers, the client should be able to determine which file is F.
I think this is a cool idea (using cache times), though it can be hard to get this right, some websites set nearly unlimited expiration times on very static content. Thus you always need to be above that, how do you ensure that?
Also, it kind of assumes that you are running this on an existing website with HTTPS support...
[..]
o If no cookie matches, then we either have an old key or we're being MITMd (the computation was ok but the cert didn't match). In these cases, we should fake some user navigation for a couple of pages then close the connection and blacklist the bridge (run for the hills and don't blow the bridge!).
:)
Greets, Jeroen (now back to that one project....)
Hey Jeroen,
Thanks for your feedback, please see inline.
On 12 September 2013 09:03, Jeroen Massar jeroen@massar.ch wrote:
On 2013-09-12 09:25 , Kevin Butler wrote:
[generic 203 proposal (and similar http scheme) comments]
HTTPS requires certificates, self-signed ones can easily be blocked as they are self-signed and thus likely not important. If the certs are all 'similar' (same CA, formatting etc) they can be blocked based on that. Because of cert, you need a hostname too and that gives another possibility of blocking
exact fingerprints of both client (if going that route) and server cert should be checked. There are too many entities with their own Root CA, thus the chained link cannot be trusted, though should be checked. (generation of a matching fingerprint for each hostname still takes a bit and cannot easily be done quickly at connect-time)
I should have made my assumptions clearer. I am assuming the CA is compromised in this idea. I have assumed it is easy to make a counterfeit and valid cert from the root but it is hard(read infeasible) to generate one with the same fingerprint of the cert the server actually has.
This is the key point that I think helps against a MITM, if the fingerprint of the cert we recieved doesn't match with what the server sent us in the hmac'd value, then we assume MITM and do nothing.
[..]
I think the best course of action is to use a webserver's core functionalities to our advantage. I have not made much consideration for client implementation.
Client side can likely be done similar to or using some work I am working on which we can hopefully finalize and put out in the open soon.
Server side indeed, a module of sorts is the best way to go, you cannot become a real webserver unless you are one. Still you need to take care of headers set, responses given and response times etc.
I'm interested in the work you've mentioned, hope you get it finalized soon :)
- The users Tor client (assuming they added the bridge), connects to the server over https(tls) to the root domain. It should also downloads all the resources attached to the main page, emulating a web browser for the initial document.
And that is where the trick lies, you basically would have to ask a real browser to do so as timing, how many items are fetched and how, User-Agent and everything are clear signatures of that browser.
As such, don't ever emulate. The above project would fit this quite well (though we avoid any use of HTTPS due to the cert concerns above).
I was hoping we could do some cool client integration with selenium or
firefox or something, but it's really out of scope of what I was thinking about.
[..some good stuff..]
- So we have our file F, and a precomputed value Z which was the function applied Y times and has a hmac H. We set a cookie on the client base64("Y || random padding || H") o The server should remember which IPs which were given this Y value.
Due to the way that HTTP/HTTPS works today, limiting/fixing on IP is near impossible. There are lots and lots of people who are sitting behind distributed proxies and/or otherwise changing addresses. (AFTR is getting more widespread too).
Also note that some adversaries can do in-line hijacking of connections, and thus effectively start their own connection from the same IP, or replay the connection etc... as such IP-checking is mostly out...
Yes, I was being generic in this, it seems like I deleted my additional comments on this, it's relatively trivial to add more data into the cookie to associate the cookie with an accepted Y value.
This cookie should pretty much look like any session
cookie that comes out of rails, drupal, asp, anyone who's doing cookie sessions correctly. Once the cookie is added to the headers, just serve the document as usual. Essentially this should all be possible in an apache/nginx module as the page content shouldn't matter.
While you can likely do it as a module, you will likely need to store these details outside due to differences in threading/forking models of apache modules (likely the same for nginx, I did not invest time in making that module for our thing yet, though with an externalized part that is easy to do at one point)
I'm hoping someone with more domain knowledge on this can comment here :) But yeah, I'm sure it's implementable.
[..]
o When rotating keys we should be sure to not accept requests on the old handlers, by either removing them(404) or by 403ing them, whatever.
Better is to always return the same response but ignore any further processing.
Note that you cannot know about pre-play or re-play attacks. With SSL these become a bit less problematic fortunately. But if MITMd they still exist.
Yes, we would obviously need to choose a single response option, I was just giving options. Hoping the MITM detection would prevent the client from ever making an action that could be replayable. But yes, mainly relying on determining if we're talking to the right server with the right cert and relying on TLS.
[..]
o The idea here is that the webserver (apache/nginx) is working EXACTLY as a normal webserver should, unless someone hits these exact urls which they should have a negligable chance of doing unless they have the current shared secret. There might be a timing attack here, but in that case we can just add a million other handlers that all lead to a 403? (But either way, if someones spamming thousands of requests then you should be able to ip block, but rotating keys should help reduce the feasability of timing attacks or brute forcing?)
The moment you do a ratelimit you are denying possibly legit clients. The only thing an adversary has to do is create $ratelimit amount of requests, presto.
Hadn't considered that, Good point. We could rely on probabilities, but I would prefer some kinda hellban ability once a censors ip has been determined (act normal just dont let their actions ever do anything)
- So, how does the client figure out the url to use for wss://? Using the cache headers, the client should be able to determine which file is F.
I think this is a cool idea (using cache times), though it can be hard to get this right, some websites set nearly unlimited expiration times on very static content. Thus you always need to be above that, how do you ensure that?
I guess I should have outlined that clearer. F is determined by whatever file has the longest cache time of the document served normally, if they put it to 50 years, we use that one, if they put two to an equal time, then the client and server will just use the first one that appears in the document. We are not to generate our own files for the computation process as that will lead our servers to be identifiable. Plus remember we have the ability to change headers, so if they're setting everything to some invalid infinity option, we just change it to 10years on the fly, I don't see this being a blocker.
Also, it kind of assumes that you are running this on an existing website with HTTPS support...
Yes, the website will need to support https, but these days you're being negligent to your users anyway if you're not allowing them https.
Does that clear any of your concerns at all?
On 2013-09-12 22:00 , Kevin Butler wrote: [..]
I should have made my assumptions clearer. I am assuming the CA is compromised in this idea. I have assumed it is easy to make a counterfeit and valid cert from the root but it is hard(read infeasible) to generate one with the same fingerprint of the cert the server actually has.
This is the key point that I think helps against a MITM, if the fingerprint of the cert we recieved doesn't match with what the server sent us in the hmac'd value, then we assume MITM and do nothing.
That should take care of that indeed.
[..]
> * The users Tor client (assuming they added the bridge), connects to > the server over https(tls) to the root domain. It should also > downloads all the resources attached to the main page, emulating a > web browser for the initial document. And that is where the trick lies, you basically would have to ask a real browser to do so as timing, how many items are fetched and how, User-Agent and everything are clear signatures of that browser. As such, don't ever emulate. The above project would fit this quite well (though we avoid any use of HTTPS due to the cert concerns above).
I was hoping we could do some cool client integration with selenium or firefox or something, but it's really out of scope of what I was thinking about.
Or a very minimal plugin into the browser that talks to a daemon that does most of the heavy lifting. That way there is no need for selenium or anything else that might differ from a real browser and plugins can exist for a variety of browsers (chrome/chromium is what we have at the moment), and when a new one comes out people can just upgrade as it is not that tightly bound to it.
[..]
> This cookie should pretty much look like any session > cookie that comes out of rails, drupal, asp, anyone who's doing > cookie sessions correctly. Once the cookie is added to the > headers, just serve the document as usual. Essentially this > should all be possible in an apache/nginx module as the page > content shouldn't matter. While you can likely do it as a module, you will likely need to store these details outside due to differences in threading/forking models of apache modules (likely the same for nginx, I did not invest time in making that module for our thing yet, though with an externalized part that is easy to do at one point)
I'm hoping someone with more domain knowledge on this can comment here :) But yeah, I'm sure it's implementable.
The knowhow is there, we got a module also on the server side, just not had the time to get everything working in that setup; if that works though nginx will be done too. (Although at the moment way is to have nginx on the front, let it proxy to Apache and have the module there)
The finishing part and the 'getting it out there' is hopefully soon, but likely around the end of october timeframe... depending on a lot of factors though.
[..]
The moment you do a ratelimit you are denying possibly legit clients. The only thing an adversary has to do is create $ratelimit amount of requests, presto.
Hadn't considered that, Good point. We could rely on probabilities, but I would prefer some kinda hellban ability once a censors ip has been determined (act normal just dont let their actions ever do anything)
As some just use the IP of the client, blocking the 'censor' is the same as blocking the client. IP based is not the way to go unfortunately.
[..]
> * So, how does the client figure out the url to use for wss://? Using > the cache headers, the client should be able to determine which file > is F. I think this is a cool idea (using cache times), though it can be hard to get this right, some websites set nearly unlimited expiration times on very static content. Thus you always need to be above that, how do you ensure that?
I guess I should have outlined that clearer. F is determined by whatever file has the longest cache time of the document served normally, if they put it to 50 years, we use that one, if they put two to an equal time, then the client and server will just use the first one that appears in the document. We are not to generate our own files for the computation process as that will lead our servers to be identifiable. Plus remember we have the ability to change headers, so if they're setting everything to some invalid infinity option, we just change it to 10years on the fly, I don't see this being a blocker.
Very good points, thanks for the elaboration.
Also, it kind of assumes that you are running this on an existing website with HTTPS support...
Yes, the website will need to support https, but these days you're being negligent to your users anyway if you're not allowing them https.
With SNA it is getting easier to just have multiple single-host certs on the same webserver, but otherwise one has to resort to a wildcard cert and those typically will cost some dear money every year.
CACert.org unfortunately is not a standard root CA yet and using CACert means for a censor that your audience is not seeing the lock either thus if they want to block they likely don't hurt too many folks.
Note that scanning sites for SSL certs and thus seeing the hostname for that site allows the censor to do a lot of things: blocking on properties of the cert, checking if the forward DNS lookup for that cert matches the host it is served on.
IMHO certs in general give off too many details about a site making scanning possible and easier to do along with easier to block.
Does that clear any of your concerns at all?
Definitely.
Greets, Jeroen