Greetings all,
I happened to download the tor-0.2.2.25-alpha.tar.gz source yesterday and I noticed something. Apologies in advance if this has already been discussed and resolved, I did a cursory web search and didn't see anything.
There are a lot of places in the code where memcmp() is called on memory buffers that look like they might contain various hashes or digests:
~/tor-0.2.2.25-alpha$ grep -r memcmp . | grep -i digest | wc -l 137
~/tor-0.2.2.25-alpha$ grep -r memcmp . | grep -i key | wc -l 14
The built-in memcmp typically runs in time proportional to the common prefix of the two memory buffers being compared. In networked applications, this is often a significant source of timing information. Sometimes these are severe enough to result in disclosure of secret key material. A good resource for remote timing attacks is Nate Lawson's blog:
http://rdist.root.org/2010/07/19/exploiting-remote-timing-attacks/
Some cases look particularly concerning. For example, in the following code 'handshake_reply' appears to be supplied by the remote party and it is compared with something called "key material".
src/or/onion.c:331: if (memcmp(key_material, handshake_reply+DH_KEY_LEN, DIGEST_LEN)) { /* H(K) does *not* match. Something fishy. */ log_warn(LD_PROTOCOL,"Digest DOES NOT MATCH on onion handshake. " "Bug or attack."); goto err; }
In the worst-case, the attacker could simply supply different values of handshake_reply, observe how long each takes to compare, and figure out the value of key_material statistically byte-by-byte.
Perhaps this key material is never used again, or there are other reasons this timing informaion is not useful. But in general, it's very difficult to determine whether or not such a timing info disclosure represents a vulnerability or how difficult it would be to exploit. Expert programmers seem to consistently underestimate the severtiy of these weaknesses, perhaps because they are so subtle. I've just started learning about Tor so it's not possible for me to review every instance.
I think the quickest and easiest solution would be to replace every usage of memcmp and str*cmp in the source with an alternate version that guarantees a constant time comparison.
I'd expect this could be done with negligible performance impact, especially for short comparisons like 20-byte hash values.
Regards,
- Marsh
Hi Marsh,
Thanks for writing to or-dev, welcome to the list!
On 05/06/2011 04:13 PM, Marsh Ray wrote:
Greetings all,
I happened to download the tor-0.2.2.25-alpha.tar.gz source yesterday and I noticed something. Apologies in advance if this has already been discussed and resolved, I did a cursory web search and didn't see anything.
After some of our discussions and some thinking, I opened this bug: https://trac.torproject.org/projects/tor/ticket/3122
There are a lot of places in the code where memcmp() is called on memory buffers that look like they might contain various hashes or digests:
~/tor-0.2.2.25-alpha$ grep -r memcmp . | grep -i digest | wc -l 137
~/tor-0.2.2.25-alpha$ grep -r memcmp . | grep -i key | wc -l 14
The built-in memcmp typically runs in time proportional to the common prefix of the two memory buffers being compared. In networked applications, this is often a significant source of timing information. Sometimes these are severe enough to result in disclosure of secret key material. A good resource for remote timing attacks is Nate Lawson's blog:
http://rdist.root.org/2010/07/19/exploiting-remote-timing-attacks/
Nate's post is great and anyone reading this email should read the above url for context. I'd also suggest this one: http://rdist.root.org/2010/01/07/timing-independent-array-comparison/
Some cases look particularly concerning. For example, in the following code 'handshake_reply' appears to be supplied by the remote party and it is compared with something called "key material".
src/or/onion.c:331: if (memcmp(key_material, handshake_reply+DH_KEY_LEN, DIGEST_LEN)) { /* H(K) does *not* match. Something fishy. */ log_warn(LD_PROTOCOL,"Digest DOES NOT MATCH on onion handshake. " "Bug or attack."); goto err; }
In the worst-case, the attacker could simply supply different values of handshake_reply, observe how long each takes to compare, and figure out the value of key_material statistically byte-by-byte.
Perhaps this key material is never used again, or there are other reasons this timing informaion is not useful. But in general, it's very difficult to determine whether or not such a timing info disclosure represents a vulnerability or how difficult it would be to exploit. Expert programmers seem to consistently underestimate the severtiy of these weaknesses, perhaps because they are so subtle. I've just started learning about Tor so it's not possible for me to review every instance.
I generally agree - I think it's best to use a constant time comparison unless you're *certain* that you want something else.
I think the quickest and easiest solution would be to replace every usage of memcmp and str*cmp in the source with an alternate version that guarantees a constant time comparison.
I think it's probably a good idea to write a new set of functions, say safe_memcmp and safe_strcmp, that are properly implemented. We can #define over memcmp but I fear that it's better to specifically eradicate each one by hand and really think things over on a case by case basis.
I'd expect this could be done with negligible performance impact, especially for short comparisons like 20-byte hash values.
I think so too.
All the best, Jake
On May 6, 2011, at 5:08 PM, Jacob Appelbaum wrote:
We can #define over memcmp but I fear that it's better to specifically eradicate each one by hand and really think things over on a case by case basis.
FWIW, I agree.
Nice work, Marsh!
On Fri, May 6, 2011 at 7:13 PM, Marsh Ray marsh@extendedsubset.com wrote:
Greetings all,
Hi, Marsh!
I replied on https://trac.torproject.org/projects/tor/ticket/3122#comment:4 . The particular case that you mention is (I think) safe (see discussion there), but the problem in general is worrisome and we should indeed replace (nearly) all of our memcmps with data-independent variants.
(Pedantic nit-pick: we should be saying "data-independent," not "constant-time." We want a memcmp(a,b,c) that takes the same number of cycles for a given value of c no matter what a and b are. That's data-independence. A constant-time version would be one that took the same number of cycles no matter what c is.)
On 05/06/2011 08:00 PM, Nick Mathewson wrote:
. The particular case that you mention is (I think) safe (see discussion there),
I figured it probably was, but illustrated some red flags.
but the problem in general is worrisome and we should indeed replace (nearly) all of our memcmps with data-independent variants.
Maybe some of the str*cmps too? I grep 681 of them.
We should also look for other cases where any data or padding might be checked, decompressed, or otherwise operated on without being as obvious as calling memcmp. Lots of error conditions can disclose timing information.
(Pedantic nit-pick: we should be saying "data-independent," not "constant-time." We want a memcmp(a,b,c) that takes the same number of cycles for a given value of c no matter what a and b are. That's data-independence. A constant-time version would be one that took the same number of cycles no matter what c is.)
That's a good point. In most of the code I glanced at, the length was fixed at compile-time. I suppose a proper "constant-time" function would have to take as much time as a 2GB comparison (on 32) :-).
int mem_neq(const void *m1, const void *m2, size_t n) { const uint8_t *b1 = m1, *b2 = m2; uint8_t diff = 0; while (n--) diff |= *b1++ ^ *b2++; return diff != 0; } #define mem_eq(m1, m2, n) (!mem_neq((m1), (m2),(n)))
Looks good to me.
What if n is 0? Is 'equals' or 'neq' a more conservative default ?
Would it make sense to die in a well-defined way if m1 or m2 is NULL?
Also, if the MSB of n is set it's an invalid condition, the kind that could result from a conversion from a signed value.
- Marsh
On Fri, May 6, 2011 at 10:40 PM, Marsh Ray marsh@extendedsubset.com wrote: [...]
but the problem in general is worrisome and we should indeed replace (nearly) all of our memcmps with data-independent variants.
Maybe some of the str*cmps too? I grep 681 of them.
Yeah; most of these are not parsing anything secret, but we should audit all of them to look for worrisome cases. It's even less clear what data-independence means for strcmp: possibly it means that the run-time should depend only on the length of the shorter string, or the longer string, or on one of the arguments arbitrarily designated as the "non-secret" string, or such. All of these could be reasonable under some circumstances, but we should figure out what we actually mean.
We should also look for other cases where any data or padding might be checked, decompressed, or otherwise operated on without being as obvious as calling memcmp. Lots of error conditions can disclose timing information.
Yeah. This is going to be a reasonably big job.
(Pedantic nit-pick: we should be saying "data-independent," not "constant-time." We want a memcmp(a,b,c) that takes the same number of cycles for a given value of c no matter what a and b are. That's data-independence. A constant-time version would be one that took the same number of cycles no matter what c is.)
That's a good point. In most of the code I glanced at, the length was fixed at compile-time. I suppose a proper "constant-time" function would have to take as much time as a 2GB comparison (on 32) :-).
int mem_neq(const void *m1, const void *m2, size_t n) { const uint8_t *b1 = m1, *b2 = m2; uint8_t diff = 0; while (n--) diff |= *b1++ ^ *b2++; return diff != 0; } #define mem_eq(m1, m2, n) (!mem_neq((m1), (m2),(n)))
Looks good to me.
What if n is 0? Is 'equals' or 'neq' a more conservative default ?
If n is 0, then "equals" is the answer: all empty strings are equal, right? :)
Would it make sense to die in a well-defined way if m1 or m2 is NULL?
Adding a tor_assert(m1 && m2) would be fine.
Also, if the MSB of n is set it's an invalid condition, the kind that could result from a conversion from a signed value.
Adding a tor_assert(n < SIZE_T_CEILING) is our usual way of handling this.
Also, as I said on the bug, doing a memcmp in constant time is harder than doing eq/neq. I *think* that all of the cases where we care about memcmp returning a tristate -1/0/1 result, we don't need data-independence... but in case we *do* need one, we'll have to do some malarkey like
int memcmp(const void *m1, const void *m2, size_t n) { /*XXX I don't know if this is even right; I haven't tested it at all */ const uint8_t *b1 = m1, *b2 = m2; int retval = 0;
while (n--) { const uint8_t v1 = b1[n], v2 = b2[n]; int diff = (int)v1 - (int)v2; retval = (v1 == v2) * retval + diff; }
return retval; }
which frankly makes me sad. I bet there's a better way to go.
On Fri, 6 May 2011 23:14:58 -0400 Nick Mathewson nickm@freehaven.net wrote:
On Fri, May 6, 2011 at 10:40 PM, Marsh Ray marsh@extendedsubset.com wrote: [...]
but the problem in general is worrisome and we should indeed replace (nearly) all of our memcmps with data-independent variants.
Maybe some of the str*cmps too? I grep 681 of them.
Yeah; most of these are not parsing anything secret, but we should audit all of them to look for worrisome cases. It's even less clear what data-independence means for strcmp: possibly it means that the run-time should depend only on the length of the shorter string, or the longer string, or on one of the arguments arbitrarily designated as the "non-secret" string, or such. All of these could be reasonable under some circumstances, but we should figure out what we actually mean.
We should also look for other cases where any data or padding might be checked, decompressed, or otherwise operated on without being as obvious as calling memcmp. Lots of error conditions can disclose timing information.
Yeah. This is going to be a reasonably big job.
(Pedantic nit-pick: we should be saying "data-independent," not "constant-time." We want a memcmp(a,b,c) that takes the same number of cycles for a given value of c no matter what a and b are. That's data-independence. A constant-time version would be one that took the same number of cycles no matter what c is.)
That's a good point. In most of the code I glanced at, the length was fixed at compile-time. I suppose a proper "constant-time" function would have to take as much time as a 2GB comparison (on 32) :-).
int mem_neq(const void *m1, const void *m2, size_t n) { const uint8_t *b1 = m1, *b2 = m2; uint8_t diff = 0; while (n--) diff |= *b1++ ^ *b2++; return diff != 0; } #define mem_eq(m1, m2, n) (!mem_neq((m1), (m2),(n)))
Looks good to me.
What if n is 0? Is 'equals' or 'neq' a more conservative default ?
If n is 0, then "equals" is the answer: all empty strings are equal, right? :)
Would it make sense to die in a well-defined way if m1 or m2 is NULL?
Adding a tor_assert(m1 && m2) would be fine.
Also, if the MSB of n is set it's an invalid condition, the kind that could result from a conversion from a signed value.
Adding a tor_assert(n < SIZE_T_CEILING) is our usual way of handling this.
Also, as I said on the bug, doing a memcmp in constant time is harder than doing eq/neq. I *think* that all of the cases where we care about memcmp returning a tristate -1/0/1 result, we don't need data-independence... but in case we *do* need one, we'll have to do some malarkey like
int memcmp(const void *m1, const void *m2, size_t n) { /*XXX I don't know if this is even right; I haven't tested it at all */ const uint8_t *b1 = m1, *b2 = m2; int retval = 0;
while (n--) { const uint8_t v1 = b1[n], v2 = b2[n]; int diff = (int)v1 - (int)v2; retval = (v1 == v2) * retval + diff; }
return retval; }
GCC is likely to turn (v1 == v2) into a backdoor. Also, we would need to make sure sign extension is constant-time; it *probably* is on IA-32 and AMD64, but we may need to disassemble the compiler's output to verify that on ARM.
Other than that, it looks correct. We *can* fix the dependence on == and make the multiply unnecessary at the same time, though.
I've attached my optimized constant-time comparison functions for 16-byte and 32-byte values to this message. They're packaged in the format for a submission to SUPERCOP and/or NaCl, but for some reason I never actually submitted them.
Robert Ransom
On May 6, 2011, at 8:53 PM, Robert Ransom wrote:
int memcmp(const void *m1, const void *m2, size_t n) { /*XXX I don't know if this is even right; I haven't tested it at all */ const uint8_t *b1 = m1, *b2 = m2; int retval = 0;
while (n--) { const uint8_t v1 = b1[n], v2 = b2[n]; int diff = (int)v1 - (int)v2; retval = (v1 == v2) * retval + diff; }
return retval; }
GCC is likely to turn (v1 == v2) into a backdoor.
Can you explain what you mean?
On Fri, 6 May 2011 22:11:06 -0700 Chris Palmer chris@eff.org wrote:
On May 6, 2011, at 8:53 PM, Robert Ransom wrote:
GCC is likely to turn (v1 == v2) into a backdoor.
Can you explain what you mean?
I would expect GCC (and most other C compilers) to use a non-constant-time implementation of (v1 == v2).
Robert Ransom
On May 6, 2011, at 10:25 PM, Robert Ransom wrote:
I would expect GCC (and most other C compilers) to use a non-constant-time implementation of (v1 == v2).
Are there machines that implement uint8_t comparison in a data-dependent way? What's an example?
On Fri, 6 May 2011 23:16:14 -0700 Chris Palmer chris@eff.org wrote:
On May 6, 2011, at 10:25 PM, Robert Ransom wrote:
I would expect GCC (and most other C compilers) to use a non-constant-time implementation of (v1 == v2).
Are there machines that implement uint8_t comparison in a data-dependent way? What's an example?
That comparison expression can be implemented in non-constant time on IA-32 processors:
; ECX = v1; EDX = v2; result in EAX XOR EAX, EAX CMP ECX, EDX JE done INC EAX done:
I think I've seen GCC emit something similar to that within the last few years, and I assume that some compilers still emit code containing a conditional branch for that expression. In general, we don't want to assume that conditional expressions are safe to use, even if a compiler *could* implement them in a safe way (e.g. by compiling Nick's function into something resembling mine).
Robert Ransom
On Fri, 6 May 2011 23:14:58 -0400 Nick Mathewson nickm@freehaven.net wrote:
Also, as I said on the bug, doing a memcmp in constant time is harder than doing eq/neq. I *think* that all of the cases where we care about memcmp returning a tristate -1/0/1 result, we don't need data-independence... but in case we *do* need one, we'll have to do some malarkey like
int memcmp(const void *m1, const void *m2, size_t n) { /*XXX I don't know if this is even right; I haven't tested it at all */ const uint8_t *b1 = m1, *b2 = m2; int retval = 0;
while (n--) { const uint8_t v1 = b1[n], v2 = b2[n]; int diff = (int)v1 - (int)v2; retval = (v1 == v2) * retval + diff; }
return retval; }
which frankly makes me sad. I bet there's a better way to go.
See attached. This one is also untested (and I didn't even put the “#include <stdint.h>” in the file), but it *should* work.
My technique for calculating equal_p came from my uint32-based crypto_verify function in my previous message, which was in turn based partly on DJB's crypto_verify functions and partly on a disassembly of what GCC compiled DJB's functions to on a Fedora 12 AMD64 box. But I couldn't tell that the technique was correct, so this time I added comments to it.
Robert Ransom
On Sat, May 7, 2011 at 12:47 AM, Robert Ransom rransom.8774@gmail.com wrote:
On Fri, 6 May 2011 23:14:58 -0400 Nick Mathewson nickm@freehaven.net wrote:
Also, as I said on the bug, doing a memcmp in constant time is harder than doing eq/neq. I *think* that all of the cases where we care about memcmp returning a tristate -1/0/1 result, we don't need data-independence... but in case we *do* need one, we'll have to do some malarkey like
int memcmp(const void *m1, const void *m2, size_t n) { /*XXX I don't know if this is even right; I haven't tested it at all */ const uint8_t *b1 = m1, *b2 = m2; int retval = 0;
while (n--) { const uint8_t v1 = b1[n], v2 = b2[n]; int diff = (int)v1 - (int)v2; retval = (v1 == v2) * retval + diff; }
return retval; }
which frankly makes me sad. I bet there's a better way to go.
See attached. This one is also untested (and I didn't even put the “#include <stdint.h>” in the file), but it *should* work.
My technique for calculating equal_p came from my uint32-based crypto_verify function in my previous message, which was in turn based partly on DJB's crypto_verify functions and partly on a disassembly of what GCC compiled DJB's functions to on a Fedora 12 AMD64 box. But I couldn't tell that the technique was correct, so this time I added comments to it.
Clever! It does look it *should* work. Somewhere along the line we should test the heck out of it and more sure it does.
(Also, Tor *does* assume that the arithmetic is two's complement: we check for it in configure.in and die horribly in torint.h if it isn't.)
Also, I suggest that we move as much of this as possible over to the bugtracker: discussing this in two places is giving me whiplash. I've posted a suggested plan of attack there.
Now alas I should sleep.
yrs,
On Sat, 7 May 2011 00:57:12 -0400 Nick Mathewson nickm@freehaven.net wrote:
On Sat, May 7, 2011 at 12:47 AM, Robert Ransom rransom.8774@gmail.com wrote:
On Fri, 6 May 2011 23:14:58 -0400 Nick Mathewson nickm@freehaven.net wrote:
Also, as I said on the bug, doing a memcmp in constant time is harder than doing eq/neq. I *think* that all of the cases where we care about memcmp returning a tristate -1/0/1 result, we don't need data-independence... but in case we *do* need one, we'll have to do some malarkey like
int memcmp(const void *m1, const void *m2, size_t n) { /*XXX I don't know if this is even right; I haven't tested it at all */ const uint8_t *b1 = m1, *b2 = m2; int retval = 0;
while (n--) { const uint8_t v1 = b1[n], v2 = b2[n]; int diff = (int)v1 - (int)v2; retval = (v1 == v2) * retval + diff; }
return retval; }
which frankly makes me sad. I bet there's a better way to go.
See attached. This one is also untested (and I didn't even put the “#include <stdint.h>” in the file), but it *should* work.
My technique for calculating equal_p came from my uint32-based crypto_verify function in my previous message, which was in turn based partly on DJB's crypto_verify functions and partly on a disassembly of what GCC compiled DJB's functions to on a Fedora 12 AMD64 box. But I couldn't tell that the technique was correct, so this time I added comments to it.
Clever! It does look it *should* work. Somewhere along the line we should test the heck out of it and more sure it does.
It worked for me once I fixed it so it would compile. See http://repo.or.cz/w/tor-utils/rransom.git/shortlog/refs/heads/tor-safe-memcmp.
Robert Ransom
On 05/06/2011 09:40 PM, Marsh Ray wrote:
On 05/06/2011 08:00 PM, Nick Mathewson wrote:
int mem_neq(const void *m1, const void *m2, size_t n) { const uint8_t *b1 = m1, *b2 = m2; uint8_t diff = 0; while (n--) diff |= *b1++ ^ *b2++; return diff != 0; } #define mem_eq(m1, m2, n) (!mem_neq((m1), (m2),(n)))
Looks good to me.
Spoke too soon.
A little birdie (goes by the name of Skywing) pointed out to me in chat that certain compilers are smart enough to realize that the externally-visible result of the function can only go from 0 to 1 one time. This might enable them to break out of the loop early once that happened.
A workaround may be the 'volatile sledgehammer':
int mem_neq(const void *m1, const void *m2, size_t n) { const uint8_t *b1 = m1, *b2 = m2; uint8_t diff = 0; volatile uint8_t * pdiff = &diff; while (n--) *pdiff |= *b1++ ^ *b2++; return *pdiff != 0; }
or perhaps
int mem_neq(const void *m1, const void *m2, size_t n) { const uint8_t *b1 = m1, *b2 = m2; uint8_t diff = 0; while (n--) { diff |= *(volatile const uint8_t *)b1 ^ *(volatile const uint8_t *)b2; b1++; b2++; } return diff != 0; }
Of course, we could always just compute SHA-256 hashes of each side and then compare those, right? :-)
- Marsh
On May 6, 2011, at 10:35 PM, Marsh Ray wrote:
Of course, we could always just compute SHA-256 hashes of each side and then compare those, right? :-)
Yes, Brad Hill suggested that (in a Java/C# context). Nate Lawson didn't like it on performance grounds, but I don't recall hearing any correctness-related complaints.
http://www.isecpartners.com/blog/2011/2/18/double-hmac-verification.html
You could use the volatile sledgehammer, and then use a unit test to make sure that it remains working over time. And/or you could put it in its own file, and compile it with -O0, in case that helps.