[tor-relays] hardware

Andy Isaacson adi at hexapodia.org
Mon Jul 15 07:18:00 UTC 2013


On Fri, Jul 12, 2013 at 06:45:58PM -0400, grarpamp wrote:
> > AMD doesn't seem to make any server CPUs that are useful for this
> > application, unfortunately.
> 
> Really, how so? Many AMD CPU's have AES-NI. Even the
> A10-6800K (4 x 4.1GHz) would be decent.

That's not a server CPU.  It doesn't seem to support ECC, and it doesn't
go in boards that are well designed for server applications (with things
like serial console BIOS support and 1U form factor).

> That plus an a85x mainboard (1Gbit)

That cheap desktop board has a Realtek NIC.  Realtek NICs are
spectacularly bad for server use cases.  We were not able to push 400
Mbps of Tor traffic on a Realtek, possibly due to the r8139 (iirc)
chip/driver lacking interrupt coalescing features.  Upgrading to an
Intel e1000e fixed the problem.  Broadcom tg3 also works fine.  The
newer Broadcom (nx or something?) chips should also be fine.

> and 8GB ddr3-2133 is $300.

It's very silly to not specify ECC RAM for a server.

> Add some case+ps.

The kind of ISPs that offer competitive pricing on bandwidth tend to
prefer commercially integrated servers, preferably sourced from vendors
they're familiar with.  That way when your server crashes and needs a
reboot at 2AM, the tech in the data center doesn't have to puzzle out
the buttons and connectors on some utterly random box that you found on
a street corner.

> Be careful, Intel likes to promote HT instead of full cores.

That's a really funny claim, since exactly the opposite seems true from
my point of view.  Intel clearly specifies how many cores and how many
HTs are provided on each CPU, and a single thread can use nearly all of
the resources on a core.  HT is useful, on Sandy Bridge, for providing fine
grained parallelism to let the CPU get useful work done during cache
miss stalls and similar, but HT is not necessary to get full ALU
utilization for in-cache codes.  AMD Bulldozer, OTOH, claims to have 8
cores, but they come in "bundles" of 2, and the "8 core" Bulldozer has
approximately the same number of ALUs and other CPU resources as the 4
core Intel chips.  As a result, each individual Bulldozer "core" (really
more like a HT on Sandy Bridge) is fairly slow in terms of
operations/clock, and AMD's resource scheduler doesn't seem to be very
good at dynamic resource allocation.

The end result of this threading nonsense is, on an Intel CPU you can
get 90% of the CPU throughput doing useful work with 4 threads, while on
a Bulldozer you need 8 threads to get 90% throughput.  For Tor that
means 8 daemons rather than 4, a significantly higher annoyance.

Making matters worse, Bulldozer has at least a 20% if not more like 30%
power penalty versus Sandy Bridge, measuring actual work done per watt
on CPU intensive workloads.

That's why I said AMD unfortunately doesn't seem to have a competitive
server CPU these days.  It's possible that Piledriver improves the
situation, but the analysis I saw did not make me optimistic that it
would be competitive with Ivy Bridge.

-andy


More information about the tor-relays mailing list