AES performance results
nickm at freehaven.net
Wed Feb 28 23:20:16 UTC 2007
On Mon, Feb 26, 2007 at 05:05:23PM -0800, Adam Langley wrote:
> On 2/26/07, Nick Mathewson <nickm at freehaven.net> wrote:
> >METHODOLOGY: I wrote a stupid benchmark function in aes.c to encrypt a
> >million cell-sized chunks using our aes_crypt function, and timed it
> >with the unix "time" command. I did this twice for each
> >(computer,code) pair, I took the median of three runs.
> You have to be very careful of cache issues with micro-benchmarks like
> that. I'm think that you're ok because the cache profile of an AES
> function is probably pretty much fixed (it walks the input and the
> output and the tables are of fixed size I'm guessing). But if the
> faster impl uses different sized tables etc (or more code, looking at
> FULL_UNROLL) you might find that, when running with the rest of the
> Tor code, the results are rather different.
Right; I'm pretty confident of the 40% improvement from switching to
OpenSSL's assembly implementation where available, but less confident
of other improvements.
A couple more developments on this front, BTW:
* I tried OpenSSL 0.9.8e on an x86_64 machine, and found out that
either the i586 assembly code isn't used on x86_64, or it is used
but offers no speed benefit over 0.9.7f.
* It looks like OpenSSL 0.9.9 (or whatever they're calling the next
one) will probably add assembly implementations for ARM, x86_64,
and sparc. Neat!
* We suffer a bit for having our AES_CTR implementation have to work
on unaligned data. I did an experiment using 508-byte cell
payloads instead of 509-byte cell payloads, and xoring uint32_ts
rather than chars: it knocked about 10% off my benchmark. This is
probably something to look at when we redesign the cell format.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 652 bytes
Desc: not available
More information about the tor-dev