On Thu, 13 Oct 2016 15:05:05 +0200 Rob van der Hoeven robvanderhoeven@ziggo.nl wrote:
This is what I was looking for. Running the benchmark on two very different systems was revealing: on my Pentium G620 the ntor server-side time was ~300 uSec, an Allwinner A20 system completed the server-side code in ~10600 uSec.
One of the things on my TODO list is to use NEON for the X25519 scalar mult on ARM targets that are capable of such since it's a decent performance increase, at least on 32 bit ARM.
One day I will also get an Aarch64 target and figure out optimization there, since it's the way of the future.
Regards,