Hello Everybody,
my relay is now almost two weeks old and has the following flags: Fast, Guard, Running, Stable, V2Dir, Valid.
I lost the HSDir flag because I had to restart the Tor process, my downtime was just a few seconds, maybe that's why I kept the Guard flag. I was expecting a drop in traffic when I got the Guard flag (as mentioned in the FAQ), but the opposite happened.
At the moment there are around 15000 active connections, over 11000 inbound and just 4000 outbound. I looked at the connections in Nyx, and it seems that my relay is indeed used as a Guard node (most of the IPs are "scrubbed" and the outgoing connections are to middle nodes).
Before I got the Guard flag, I had around 5000 connections at the same time and was relaying traffic at peaks of 55MB/s. My server is connected to a Gigabit link. It's not a regular VPS, I have a dedicated CPU with two cores and dedicated 8GB RAM. Traffic is unlimited.
The problem is that I'm now relaying traffic at ~25MB/s, and whenever there are spikes of over 30MB/s the CPU load on both cores (!) is very high. I'm still moving ~5TB per day, that's a lot, I know. But there would be even more possible with the internet connection of my server.
My Server has two dedicated CPU cores of an AMD EPYC 7702, but unfortunately I only get the base frequency of 2GHz inside the VM, not the boost frequency of 3,35GHz (misleading information on the hoster's website).
I could relay way more traffic if there wouldn't be this issue with the CPU load. This is the bottleneck, the 1Gbit link is guaranteed.
I read in the FAQ that a modern CPU with hardware acceleration is able to relay traffic @~500Mbit in both directions. The EPYC 7702 supports AES-NI. I checked this, it is activated in my VM.
I'm running Debian 11 Bullseye and tweaked the networking capabilities with some instructions I found from torservers.net (mnostly sysctl.conf tweaks)
There is no additional software installed that uses lots of ressources, just a few tools.
Here is a screenshot of Glances during a traffic peak (I set the Tor process to +10 on purpose): https://i.ibb.co/8brmZkf/glances.png
The average CPU load is ~1.50, this is still ok for a dual core, but it should stay below 2.0 (at least it should not go above 2.0 for more than a few minutes).
Does anyone here have an idea what I could do? Since the load on both cores is pretty high, I don't think it makes much sense to set up a second relay on the same server.
Of course I could throttle the traffic, but is there anything else I can do? I rented this rather expensive server to help the Tor network with a really fast Guard node...
Thank you everyone for your time and responses! Have a great weekend!
Best Regards, Elias
On Friday, November 19, 2021 10:41:27 AM CET Elias via tor-relays wrote:
Hello Everybody,
my relay is now almost two weeks old and has the following flags: Fast, Guard, Running, Stable, V2Dir, Valid.
I lost the HSDir flag because I had to restart the Tor process, my downtime was just a few seconds, maybe that's why I kept the Guard flag.
This is normal, HSDir flag is always gone after reboot or restart. Other flags remain after reboot or restart.
At the moment there are around 15000 active connections, over 11000 inbound and just 4000 outbound. I looked at the connections in Nyx, and it seems that my relay is indeed used as a Guard node (most of the IPs are "scrubbed" and the outgoing connections are to middle nodes).
Before I got the Guard flag, I had around 5000 connections at the same time and was relaying traffic at peaks of 55MB/s. My server is connected to a Gigabit link. It's not a regular VPS, I have a dedicated CPU with two cores and dedicated 8GB RAM. Traffic is unlimited.
Many VMs with 1G are still throttled. You share the server bandwidth with all other VM customers.
The problem is that I'm now relaying traffic at ~25MB/s, and whenever there are spikes of over 30MB/s the CPU load on both cores (!) is very high. I'm still moving ~5TB per day, that's a lot, I know. But there would be even more possible with the internet connection of my server.
~5TB per day ≈ 150 TB/month You usually don't even get that on a dedicated bare metal root server that costs $ 30-100 a month. One of my hosters limited bandwith to 300Mbit after 10TB of traffic.
Uh, welcome to the club. ;-) Because of DDoS, I have had 40 cores at around 90% for weeks. Until 3 weeks ago the ixgbe driver was killed every 2-3 days. I hope I have solved the problem now.
My Server has two dedicated CPU cores of an AMD EPYC 7702, but unfortunately I only get the base frequency of 2GHz inside the VM, not the boost frequency of 3,35GHz (misleading information on the hoster's website).
I could relay way more traffic if there wouldn't be this issue with the CPU load. This is the bottleneck, the 1Gbit link is guaranteed.
I read in the FAQ that a modern CPU with hardware acceleration is able to relay traffic @~500Mbit in both directions. The EPYC 7702 supports AES-NI. I checked this, it is activated in my VM.
I'm running Debian 11 Bullseye and tweaked the networking capabilities with some instructions I found from torservers.net (mnostly sysctl.conf tweaks)
The old stuff from their github? I would delete them again. You are in a VM and the torservers.net sysctl.conf settings are over 10 years old! (A joke by niftybunny: From times when low traffic was RFC 2549.) 1G NIC has long been standard. With Debian 9, 10 and 11 I only used the default 'sysctl' settings. Means none at all. tcp-syncookies has also been enabled in Debian for many, many years.
The average CPU load is ~1.50, this is still ok for a dual core, but it should stay below 2.0 (at least it should not go above 2.0 for more than a few minutes).
Does anyone here have an idea what I could do? Since the load on both cores is pretty high, I don't think it makes much sense to set up a second relay on the same server.
Maybe it helps:
I have iptables persistent on my guard servers. Sample rules: https://github.com/boldsuck/tor-relay-bootstrap/tree/master/etc/iptables
or try
MaxAdvertisedBandwidth If set, we will not advertise more than this amount of bandwidth for our BandwidthRate. Server operators who want to reduce the number of clients who ask to build circuits through them (since this is proportional to advertised bandwidth rate) can thus reduce the CPU demands on their server without impacting network performance
Of course I could throttle the traffic, but is there anything else I can do? I rented this rather expensive server to help the Tor network with a really fast Guard node...
First of all, thank you very much for your response!
This is normal, HSDir flag is always gone after reboot or restart. Other flags remain after reboot or restart.
I know, it wouldn't even bother me if I lost the Guard flag. The Tor network can decide whatever it want's to use my relay for.
Many VMs with 1G are still throttled. You share the server bandwidth with all other VM customers.
This one is not. The hoster sells this machine as a "Root Server", it's actually connected to a 2,5Gbit link. The 1Gbit speed is guaranteed, and before I set up the relay I made multiple speed tests - I definitely get 1Gbit.
The problem is that I'm now relaying traffic at ~25MB/s, and whenever there are spikes of over 30MB/s the CPU load on both cores (!) is very high. I'm still moving ~5TB per day, that's a lot, I know. But there would be even more possible with the internet connection of my server.
~5TB per day ≈ 150 TB/month You usually don't even get that on a dedicated bare metal root server that costs $ 30-100 a month. One of my hosters limited bandwith to 300Mbit after 10TB of traffic.
I paid close attention to any limit rules, and there is one. But I'm unable to break this rule: They limit my bandwith to 200Mbit when I used more than 120TB of traffic within one month and at the same time (!) used more than 1Gbit bandwith on average (!) for more than 60 minutes. I set MaxAdvertisedBandwith to 1000Mbit, so I will never get throttled by the hoster.
Uh, welcome to the club. ;-) Because of DDoS, I have had 40 cores at around 90% for weeks. Until 3 weeks ago the ixgbe driver was killed every 2-3 days. I hope I have solved the problem now.
Yeah, and this wasn't even a DDoS. If don't change my config then it's pretty easy to shoot my server off the internet with a low scale DDoS. And we both know they do this, especially with high capacity Guard nodes... I secured the server as good as I could before it went online, but there is no real DDoS protection in place, and it seems I need it.
The old stuff from their github? I would delete them again. You are in a VM and the torservers.net sysctl.conf settings are over 10 years old!
The old stuff from this mailing list. But you're right, that stuff was from 2010, I will revert back to normal.
I have iptables persistent on my guard servers. Sample rules: https://github.com/boldsuck/tor-relay-bootstrap/tree/master/etc/iptables
Thank you, I'll give that a try!
If set, we will not advertise more than this amount of bandwidth for our BandwidthRate. Server operators who want to reduce the number of clients who ask to build circuits through them (since this is proportional to advertised bandwidth rate) can thus reduce the CPU demands on their server without impacting network performance
This will be my next step if the iptables rules have no effect. At the moment I advertise 125 MiB, this is obviously very optimistic... I have by far the fastest relay at this hoster in terms of bandwith, but that's nothing to be proud of if the relay crashes or is overloaded all the time.
Thanks again for your suggestions!
All the best! Elias
On Fri, Nov 19, 2021 at 09:41:27AM +0000, Elias via tor-relays wrote:
Does anyone here have an idea what I could do? Since the load on both cores is pretty high, I don't think it makes much sense to set up a second relay on the same server.
It does from my experience. Run two relays on the machine. Set RelayBandwidthRate to the same value for both relays. Care about the total throughput and not the peak for one relay.
Regards, Johan
tor-relays@lists.torproject.org