On Wed, Sep 11, 2013 at 05:13:04PM +0200, Jeroen Massar wrote:
Are boxes that are doing these speeds running at a CPU or a network cap? Or maybe better asked, they do run at 100% usage of their cores or do they just use two/three cores to the max?
There are three main sinks of CPU usage in a well-configured large Tor relay:
1. doing AES and SHA. This scales with the network bandwidth used. 2. doing Montgomery multiplication for circuit creation requests. 3. bookkeeping.
(4. kernel TCP overhead etc.)
Until the August botnet hit, #1 was the primary user of CPU on most relays. A single Xeon core can do about 150 MB/sec of AES, or with AES-NI around 700 MB/sec.
With the vastly increased circuit creation load currently in progress, #2 and #3 have become a larger problem. The bookkeeping, in particular, has grown significantly. On noisetor right now, 17% of all CPU cycles are being spent in a single bookkeeping routine, circuit_unlink_all_from_channel, according to "perf top".
https://trac.torproject.org/projects/tor/ticket/9683
This increased circuit-create-and-destroy CPU load reduces the CPU available to do useful AES, so as a result, currently many Tor relays are showing increased CPU usage with decreased bandwidth usage.
You'll have trouble getting a single Xeon core to run much more than 300 Mbps even with AES-NI, even without the botnet increasing CPU load without increasing throughput usage. In the current state, with so much extra bignum work and bookkeeping, a single daemon will have even more trouble pushing much bandwidth.
Best practice for maximum bandwidth is to run one Tor daemon per physical core, each on a distinct IP address. Plan for each daemon to push about 15 MByte/sec. They can do more like 20 or 30, but planning for lower leaves some headroom.
Your boxes, with 12 cores and 70 GB of RAM, are quite a bit overpowered for running 500 Mbps of Tor. If you ran a Tor daemon per core, you'd be able to push around 2 Gbps of Tor traffic, easily.
-andy