Getting max bandwidth out of a relay

11 Sep 2013

      TLDR: what do people to do get the max throughput through their boxes?

Hi,

This might be more tor-dev related (due to the Tor internals, eg why it
does not use multiple CPU cores effectively etc), but is likely a bit
more appropriate here as there are people who are able to get a lot of
performance out of their boxes.

I've been playing a bit with setting up a few relays and letting them
push as much traffic as possible following amongst others the items at:
  https://www.torservers.net/wiki/setup/server

Thus making sure it is using AES-NI (which turned off and then on made a
bit of a difference but primarily in CPU load), and doing some TCP stack
and other kernel tweaks.

I am running the current-git Tor on them, thus self-compiled and except
for the install path no special configure options (any tips there?).

The boxes are 2-cpu 6-core E5645 @ 2.40Ghz, with HT thus 24 cores
visible. Tor is using about 170% CPU (thus effectively 2 cores) on
average along with 3G of mem, the box has 70G of mem thus that is not a
problem.

A little snapshot from 'arm' from one of the boxes:

Bandwidth (limit: 3.9 Gb/s, burst: 3.9 Gb/s, measured: 353.9 Kb/s)
Download (45.7 Mb/sec   - avg: 27.2 Mb/sec, total: 302.2 GB)
Upload (52.5 Mb/sec   - avg: 27.9 Mb/sec, total: 308.4 GB):

Down/Up varies upto 70mbit, the box has full GE and between them can
push easily a single-stream 900mbit flow (tested with iperf/wgets/scp)
next to the running Tor process. Thus there seem to be some significant
issue in the Tor portion of things (though tuning might affect it as
there are more flows etc). There is no connection tracking on the box,
as that would just slow things down

See also the munged torrc below, in case there are options to be set there.

What else is there to tune except for maybe running multiple Tor nodes
on the same box? Which would require it to use multiple IPs right as one
can only run 2 nodes on the same IP I understand.

Would there maybe be a way to run multiple Tor processes with the same
key/identity but with a TCP load-balancer in front of it which
distributes the incoming connections to the processes? The only thing
then is that only one of them should report their details to the
authorities and the others should not publish; would that be possible or
would it mess up for instance performance stats?

Greets,
 Jeroen

--

torrc used:
-----
NickName <nick>
ContactInfo <contact>
MyFamily $<othernode>

ControlPort 9051
HashedControlPassword 16:<pass>
CookieAuthentication 1

DirPort <ip>:<port>
DirPortFrontPage /usr/local/tor/etc/tor/tordirport.html

ORPort <ip>:<port>

RelayBandwidthRate 600 MB
RelayBandwidthBurst 606 MB

SocksListenAddress 127.0.0.1
SocksPort 1080

ExitPolicy reject *:*

#Log debug file /usr/local/tor/var/log/tor/debug.log
Log notice file /usr/local/tor/var/log/tor/notices.log
DataDirectory /usr/local/tor/var/lib/tor

RunAsDaemon 1
DisableDebuggerAttachment 0

CellStatistics 1
DirReqStatistics 1
EntryStatistics 1
ExitPortStatistics 1
ExtraInfoStatistics 1
-----

/etc/sysctl.d/tor.conf
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_syn_retries=2
net.core.rmem_max=33554432
net.core.wmem_max=33554432
net.ipv4.tcp_rmem=4096 87380 33554432
net.ipv4.tcp_wmem=4096 65536 33554432
net.core.netdev_max_backlog=262144
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_moderate_rcvbuf=1
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_max_orphans=262144
net.ipv4.tcp_max_syn_backlog=262144
net.ipv4.tcp_fin_timeout=4
vm.min_free_kbytes=65536
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_intvl=10
net.ipv4.tcp_keepalive_probes=3
net.ipv4.ip_local_port_range=1025 65530
net.core.somaxconn=20480
net.ipv4.tcp_max_tw_buckets=2000000
net.ipv4.tcp_timestamps=0

Jeroen Massar

Moritz Bartl

Jeroen Massar

Andy Isaacson

Jeroen Massar

Stephan

tags

participants (4)