Tips for running a Tor relay

Wed May 5 04:42:02 UTC 2010

Hi list!

I am writing to share my experience about running a Tor relay. The idea
came while my VPS was suspended for the first time due to abuse
reports. 

I run an exit node where the saying pick two out of three is true
<http://en.wikipedia.org/wiki/Project_triangle>. On this particular
relay I opted for fast unmetered bandwith and cheap monthly cost. This
came at a price and the relay is seldom stable. The VPS needs to be 
reinstalled often due to provider errors, hardware and network problems 
and support answers is clueless. But cheap and fast weight up some of 
that and made it a challenge to try running the relay in the best way 
possible

The VPS uses XEN, running Debian stable with 256MB RAM and 512MB swap.
The RAM is used to the limit and the linux OOM-killer kills processes
every now and then. On an unmetered network Tor competes with whatever
else is running on the same connection and my thinking is that the more
connections at the same time the more bandwith can be used. Pushing 
4 MB/s is possible with this amount of RAM. 

Some tips on running a fast relay with a small amount of RAM.

Set MaxAdvertisedBandwidth lower than your RelayBandwidthRate in
torrc. This is so that the machine is not overloaded with more
connections than it can handle. There is no problem to reach the
RelayBandwidthRate anyway. (This might not be as effective as it used
to be due to recent changes in Tor.)

Make sure processes is started automatically if they die for some 
reason, OOM-killer is the usual culprit. I use Monit to make sure that
tor, syslog, the local mailserver and sshd is working. Sometimes the 
OOM-killer kills the wrong process and I need those processes to get 
status mails from the node.

Create offsite backups. On a cheap host you will soon need them. 

Sign up for Tor Weather report at https://weather.torproject.org/ to
get a notification if your node is down. Since Tor is restarted by
monit for me this usally means a problem at the VPS provider and I need 
to file a support ticket for them to fix it.

Monitor the bandwith usage. I use vnstat for this. A simple tool that
gets the job done. With more RAM I would also run munin-node. vnstat
is enough to give realtime, last hours, daily and monthly figures for
the traffic in and out of the system.

Avoid logging in to the system since it will use more RAM. Setup a few
cronjobs instead that mails away traffic stats from vnstat and whatever
else you might want to monitor such as the number of connections. A mail
when the relay is unexpectedly rebooted is also nice. 

Keep up to date with the security patches for your OS. On Debian I like
apticron for this.

Compile the latest and greatest. It made a huge difference for the
stability when I compiled my own binary with the latest OpenSSL and
--enable-openbsd-malloc (for linux). Moving to the unstable git
version also improved the uptime but your results may vary.

Do not run a local firewall. Tracking connections is only overhead that
will use more CPU and RAM. Have no unnecessary services listening instead.

Do not be a directory mirror if you can supply provide exit traffic. 
Use more of the bandwith for exit traffic instead. It might help with 
the RAM usage to, but I'm not sure. 

Change kernels settings with sysctl -w and monitor the result. But make 
sure to add the working settings to /etc/sysctl.conf, expect reboots on
a cheap VPS. 
The amount of file descriptors the kernel allows depends on how much
RAM there is. I needed to increase fs.file-max. 
I also increased net.ipv4.tcp_max_orphans to avoid the error 
"Out of socket memory". 
I lowered net.ipv4.tcp_keepalive_time to make sure old unused
connections does not take memory. 
I lowered these settings to make sure the VPS can handle more
connections:
net.core.rmem_default
net.core.wmem_default
net.core.rmem_max
net.core.wmem_max
I also set this to make sure more ports can be used for connections:
net.ipv4.ip_local_port_range=2000 65535

It is hard to know how each setting affects the performance since it
depends on a lot of things. Is it other users on the XEN server or the
network that use the bandwith or CPU more or less now?
Is the Tor network different now compared to yesterday or is my relay
abused as a one-hope exit? 
Try changing one setting at a time and let the relay run unchanged for 
some days after the change. Version control your configuration to
remember what you changed and when. 

Pay only for a month in advance. Longer contracts looks cheaper but it
is likely that you need to change provider due to abuse reports or due
to using to much of the "unmetered" bandwith.

I hope these notes might help some current of future operators when
running relays. My future plans include configuring a Puppet server
to handle the installation and configuration of Tor relays in an
automated way. If anyone already has some Puppet scripts for Tor to
share please do.  

-- 
Tor relay operator
http://fejk.se/tor