Hello,
I'd like to comment on this topic, as I see a potential for improvements to stay below the radar and avoid all kinds of (minor) detections.
Perhaps countrary to how others reply, forgive me that I comment inline here as my reply is lengthy and typically comment on the block of text before it.
On 9/3/12 6:02 PM, Nick Mathewson wrote:
Filename: 195-TLS-normalization-for-024.txt Title: TLS certificate normalization for Tor 0.2.4.x
[...]
We proposes improvements to Tor's current TLS certificates to reduce the distinguishability of Tor traffic.
[...]
1.2. Allow externally generated certificates
It should be possible for a Tor relay operator to generate and provide their own certificate and secret key. This will allow a relay or bridge operator to use a certificate signed by any member of the "SSL mafia,"[*] to generate their own self-signed certificate, and so on.
For compatibility, we need to require that the key be an RSA secret key, of at least 1024 bits, generated with e=65537.
I would like to suggest to add a statement to use SHA1 (or if you can work this in) use a SHA2 hash. My motivation here is not to avoid MD5, but more that the majority of CAs is using SHA1 now and will soon migrate to SHA2. Using MD5 is a yellow flag in some situation.
As a proposed interface, let's require that the certificate be stored in ${DataDir}/tls_cert/tls_certificate.crt , that the secret key be stored in ${DataDir}/tls_cert/private_tls_key.key , and that they be used instead of generating our own certificate whenever the new boolean option "ProvidedTLSCert" is set to true.
(Alternative interface: Allow the cert and key cert to be stored wherever, and have the user provide their respective locations with TLSCertificateFile and TLSCertificateKeyFile options.)
Would it be possible to use a similar approach to SSLSniff? As it doesn't really matter what kind of certificate is issued here, you could think of generating something fitting on the fly. There is a better motivation further in my reply on what fitting is as this is part of my comment too.
1.3. Longer certificate lifetimes
Typically, certificates are valid for a year, so let's use that as our default lifetime. [TODO: investigate whether "a year" for most CAs and self-signed certs have their validity dates running for a calendar year ending at the second of issue, one calendar year ending at midnight, or 86400*(365.5 +/- .5) seconds, or what.]
It depends on the CA and the resource provider. Each have their influences and various motivations to have result in a certain validity period. Sometimes it could hint a policy change at a CA or a change in the resource en service provider.
Let's take exhibit A. over here: Tuesday, 28 February 2012 2:00:10 PM Thursday, 28 February 2013 2:10:10 PM
And exhibit B.: Thursday, 17 November 2011 1:00:00 AM Saturday, 14 July 2012 1:59:59 AM
This is slightly longer then six months valid and used by a lot of people.
Most of the times I see certificates get issues regardless of leap year details. So that would be per calendar year. Just, date to date, plus or minus one day depending on how cool the CA could be. Or even plus or minus one month. In that case you have one year of usage and one month of a planned migration opportunity to a new certificate. This is very luxerious and might be rare in the commercial world, but happens too.
There are two ways to approach this. We could continue our current certificate management approach where we frequently generate new certificates (albeit with longer lifetimes), or we could make a cert, store it to disk, and use it for all or most of its declared lifetime.
Using shorter-lived certificates with long nominal lifetimes doesn't seem to buy us much. It would let us rotate link keys more frequently, but we're already getting forward secrecy from our use of diffie-hellman key agreement. Further, it would make our behavior look less like regular TLS behavior, where certificates are typically used for most of their nominal lifetime. Therefore, let's store and use certs and link keys for the full year.
I would try to mimic the normal certificate lifecycle. Which means that a connections will typically not have a certificate validation time of today. It will most likely be something in the past. It's typically going to be valid for more then 6 months (otherwise operational costs are going to be too high if the time is lowered). It should be used more or less persistently. What I mean by that is that a client connecting to a service will likely get a similar certificate in return when it reconnects within the day. This mimics the behavior of a real servers that would present the same certificate with a similar connection request. I'm ignoring the existence of TLS.1.1+ here as I'm focussing on patterns and lowering the opportunities for pattern matching.
Another opportunity here is to use the public key rekeying feature provided by the CAs. When something might be fishy and you don't really know if the private key was kept in check, then to avoid a big fuzz you can go for rekeying. If not used too much you could use this in Tor's advantages and cycle the keys and keeping the same certificate. Keeping the same certificate is the persistence I was focussing on previously.
1.4. Self-signed certificates with better DNs
When we generate our own certificates, we currently set no DN fields other than the commonName. This behavior isn't terribly common: users of self-signed certs usually/often set other fields too. [TODO: find out frequency.]
Do you use Subject Alt Names? If not, I would yellow flag your certificate. If you mimic an HTTPS connection I would use it as a (partial) signature if you wouldn't comply to RFC2818 (http://tools.ietf.org/html/rfc2818, section 2.2.1).
Unfortunately, it appears that no particular other set of fields or way of filling them out _is_ universal for self-signed certificates, or even particularly common. The most common schema seem to be for things most censors wouldn't mind blocking, like embedded devices. Even the default openssl schema, though common, doesn't appear to represent a terribly large fraction of self-signed websites. [TODO: get numbers here.]
So the best we can do here is probably to reproduce the process that results in self-signed certificates originally: let the bridge and relay operators to pick the DN fields themselves. This is an annoying interface issue, and wants a better solution.
If you want to do sneaky self-signed certificates, you could make the Subject DN and the Issuer DN different. It would be one less hint or identifier to track. Typically the amount of traffic to services hosting self-signed certificates is lower then the big sites, therefore I think you can take the load of doing this indepth inspection if you wanted to.
Why can't the tool construct a various amounts of elements (somewhere between 3 and 7) of RDNs in the Subject and Issuer DNs? The bridge or relay operators should be able to add a personal touch for human entropy's sake. It also avoid making similar uber-short Subject DNs which is an easy pattern to track. I'm amazed this wasn't observed/used before.
Perhaps it would even make sense to build-in a signing policy for each bridge or relay, I would pitch it as a tiny-CPS. This means that the bridges/relays will scope the namespace of the Subject DNs that it will construct, including the Subject DN of its own CA itself. Normal CAs (should) typically follow a fix pattern. If the pattern is chosen believable enough, then you could think of using this predictability to stay below the radar on that end.
Example: "/C=NL/O=acme inc/OU=trust me" "/C=NL/O=acme inc/OU=trusted auth/OU=vaccuum/CN=host-15.dyndns.org
Perhaps this is more engineering work then it will pay off, but ok, I'm pitching an idea for completeness sake in this sense.
1.5. Better commonName values
Our current certificates set the commonName to a randomly generated field like www.rmf4h4h.net. This is also a weird behavior: nearly all TLS certs used for web purposes will have a hostname that resolves to their IP.
The simplest way to get a plausible commonName here would be to do a reverse lookup on our IP and try to find a good hostname. It's not clear whether this would actually work out in practice, or whether we'd just get dynamic-IP-pool hostnames everywhere blocked when they appear in certificates.
Alternatively, if we are told a hostname in our Torrc (possibly in the Address field), we could try to use that.
Why are we bothering with the Subject DN's CN field in this part? It's been legacy since May 2000. I would suggest to leave it out and make it unusable. Perhaps I'm too modern here and I would creep up on some radar by doing this...
According to RFC2818: Though shall check the Subject Alt Names first. If you really don't have a Subject Alt Names block in the cert, then take the most significant CN field. There is an exception rule which states that you could use a certificate signed for a different hosts if the client is expecting this. These details don't matter here, but apparently Tor has been hanging on to the exception rule here. I would yellow flag this 'to safe guard my users'.
If I sound weird in this, have a look at libcurl's way of doing its checks in the code. I personally favor the OpenSSL connectors for readability (weirdly enough).
As stated before I would love to see that the connection could be a truly verifiable correct connection on this level, compliant to RFC2818 and others. If DNSnames don't make sense you could think of adding IP-addresses in the SubjectAltNames of even the CN fields. It's something you don't see in the commercial CA business that often, but it happens and could be used to Tor's advantage.