TLDR: what do people to do get the max throughput through their boxes?
Hi,
This might be more tor-dev related (due to the Tor internals, eg why it does not use multiple CPU cores effectively etc), but is likely a bit more appropriate here as there are people who are able to get a lot of performance out of their boxes.
I've been playing a bit with setting up a few relays and letting them push as much traffic as possible following amongst others the items at: https://www.torservers.net/wiki/setup/server
Thus making sure it is using AES-NI (which turned off and then on made a bit of a difference but primarily in CPU load), and doing some TCP stack and other kernel tweaks.
I am running the current-git Tor on them, thus self-compiled and except for the install path no special configure options (any tips there?).
The boxes are 2-cpu 6-core E5645 @ 2.40Ghz, with HT thus 24 cores visible. Tor is using about 170% CPU (thus effectively 2 cores) on average along with 3G of mem, the box has 70G of mem thus that is not a problem.
A little snapshot from 'arm' from one of the boxes:
Bandwidth (limit: 3.9 Gb/s, burst: 3.9 Gb/s, measured: 353.9 Kb/s) Download (45.7 Mb/sec - avg: 27.2 Mb/sec, total: 302.2 GB) Upload (52.5 Mb/sec - avg: 27.9 Mb/sec, total: 308.4 GB):
Down/Up varies upto 70mbit, the box has full GE and between them can push easily a single-stream 900mbit flow (tested with iperf/wgets/scp) next to the running Tor process. Thus there seem to be some significant issue in the Tor portion of things (though tuning might affect it as there are more flows etc). There is no connection tracking on the box, as that would just slow things down
See also the munged torrc below, in case there are options to be set there.
What else is there to tune except for maybe running multiple Tor nodes on the same box? Which would require it to use multiple IPs right as one can only run 2 nodes on the same IP I understand.
Would there maybe be a way to run multiple Tor processes with the same key/identity but with a TCP load-balancer in front of it which distributes the incoming connections to the processes? The only thing then is that only one of them should report their details to the authorities and the others should not publish; would that be possible or would it mess up for instance performance stats?
Greets, Jeroen
--
torrc used: ----- NickName <nick> ContactInfo <contact> MyFamily $<othernode>
ControlPort 9051 HashedControlPassword 16:<pass> CookieAuthentication 1
DirPort <ip>:<port> DirPortFrontPage /usr/local/tor/etc/tor/tordirport.html
ORPort <ip>:<port>
RelayBandwidthRate 600 MB RelayBandwidthBurst 606 MB
SocksListenAddress 127.0.0.1 SocksPort 1080
ExitPolicy reject *:*
#Log debug file /usr/local/tor/var/log/tor/debug.log Log notice file /usr/local/tor/var/log/tor/notices.log DataDirectory /usr/local/tor/var/lib/tor
RunAsDaemon 1 DisableDebuggerAttachment 0
CellStatistics 1 DirReqStatistics 1 EntryStatistics 1 ExitPortStatistics 1 ExtraInfoStatistics 1 -----
/etc/sysctl.d/tor.conf net.ipv4.tcp_syncookies=1 net.ipv4.tcp_synack_retries=2 net.ipv4.tcp_syn_retries=2 net.core.rmem_max=33554432 net.core.wmem_max=33554432 net.ipv4.tcp_rmem=4096 87380 33554432 net.ipv4.tcp_wmem=4096 65536 33554432 net.core.netdev_max_backlog=262144 net.ipv4.tcp_no_metrics_save=1 net.ipv4.tcp_moderate_rcvbuf=1 net.ipv4.tcp_tw_recycle=1 net.ipv4.tcp_max_orphans=262144 net.ipv4.tcp_max_syn_backlog=262144 net.ipv4.tcp_fin_timeout=4 vm.min_free_kbytes=65536 net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_intvl=10 net.ipv4.tcp_keepalive_probes=3 net.ipv4.ip_local_port_range=1025 65530 net.core.somaxconn=20480 net.ipv4.tcp_max_tw_buckets=2000000 net.ipv4.tcp_timestamps=0