[tor-relays] How to reduce tor CPU load on a single bridge?

Gary C. New garycnew at yahoo.com
Sat Jan 29 02:54:40 UTC 2022


On Thursday, January 27, 2022, 1:03:25 AM MST, David Fifield <david at bamsoftware.com> wrote:

>> It's nice to see that the Snowflake daemon offers a native configuration option for LimitNOFile. I ran into a similar issue with my initial loadbalanced Tor Relay Nodes that was solved at the O/S level using ulimit. It would be nice if torrc had a similar option.

> LimitNOFile is actually not a Snowflake thing, it's a systemd thing. It's the same as `ulimit -n`. See:


Ah... My mistake. In my cursory review of your "Draft installation guide" I only saw snowflake-server. and assumed it was .conf where in actuality it is .service. I should have noticed the /etc/systemd path. Thank you for the correction.

>> From your documentation, it sounds like you're running everything on the same machine? When expanding to additional machines, similar to the file limit issue, you'll have to expand the usable ports as well.

> I don't think I understand your point. At 64K simultaneous connections, you run out of source ports for making connection 4-tuple unique, but I don't see how the same or different hosts makes a difference, in that respect.

On many Linux distros, the default ip_local_port_range is between 32768 - 61000.

# cat /proc/sys/net/ipv4/ip_local_port_range

32768 61000

The Tor Project recommends increasing it.

# echo 15000 64000 > /proc/sys/net/ipv4/ip_local_port_range

>> I found your HAProxy configuration in your “Draft installation guide.” It seems you’re using regular TCP streaming mode with the Snowflake instances vs transparent TCP streaming mode, which is a notable difference with the directly loadbalanced Tor Relay configuration.

> I admit I did not understand your point about transparent proxying. If it's about retaining the client's source IP address for source IP address pinning, I don't think that helps us.

In Transparent TCP Steam mode, the Loadbalancer clones the IP address of the connecting Tor Client/Relay for use on the internal interface with connections to the upstream Tor Relay Nodes, so the Upstream Tor Relay Nodes believe they're talking to the actual connecting Tor Client/Relay.

> This is a bridge, not a relay, and the source IP address that haproxy sees is several steps removed from the client's actual IP address. haproxy receives connections from a localhost web server (the server pluggable transport that receives WebSocket connections); the web server receives connections from Snowflake proxies (which can and do have different IP addresses during the lifetime of a client session); only the Snowflake proxies themselves receive direct traffic from the client's own source IP address.

You are correct. This makes more sense why HAProxy's Regular TCP Streaming Mode works in this paradigm. I believe what was confusing was the naming convention of your Tor instances (i.e., snowflake#), which lead me to believe that your Snowflake proxy instances were upstream and not downstream. However, correlating the IP address assignments between configurations confirms HAProxy is loadbalancing upstream to your Tor Nodes.

> The client's IP address is tunnelled all the way through to tor, for metrics purposes, but that uses the ExtORPort protocol and the load balancer isn't going to understand that.

As long as HAProxy is configured to use TCP Streaming Mode, it doesn't matter what protocol is used as it will be passed through encapsulated in TCP. That's the beauty of TCP Streaming Mode.

> I think that transparent proxying would only transparently proxy the localhost IP addresses from the web server, which doesn't have any benefit, I don't think.


>> You might test using a timeout value of 0s (to disable the timeout at the loadbalancer) and allow the Snowflake instances to preform state checking to ensure HAProxy isn’t throttling your bridge.

> Thanks for that hint. So far, 10-minute timeouts seem not to be causing a problem. I don't know this software too well, but I think it's an idle timeout, not an absolute limit on connection lifetime.

It's HAProxy's Passive Health Check Timeout. The reason why I disabled (0s) this timeout is I felt that the Tor instances know their state threshold better and if they became overloaded would tell the DirectoryAuthorities. One scenario where a lengthy HAProxy timeout might be of value is if a single instance was having issues and causing a reported overloaded state for the rest. However, this would more likely occur in a multi-physical/virtual-node environment. You'll have to continue to update me with your thoughts on this subject as you continue your testing.

>> Any reason why you chose HAProxy over Nginx?

> Shelikhoo drafted a configuration using Nginx, which for the time being you can see here:



> I don't have a strong preference and I don't have a lot of experience with either one. haproxy seemed to offer fewer opportunities for error, because the default Nginx installation expects to run a web server, which I would have to disable and ensure it did not fight with snowflake-server for port 443. It just seemed simpler to have one configuration file to edit and restart the daemon.

My Nginx configuration is actually smaller than my HAProxy configuration. All you really need from either Nginx/HAProxy configurations are the Global Default settings (especially the file/connection limits) and your TCP Streaming settings. As stated previously, I would recommend using Nginx simply for the fact that it forks additional child processes as connections/demand increases, which I could never figured out with HAProxy.

>> I did notice that you’re using the AssumeReachable 1 directive in your torrc files. Are you running into an issue where your Tor instances are failing the reachability test?

> It's because this bridge does not expose its ORPort, which is the recommended configuration for default bridges. The torrc has `ORPort`, so the bridges will never be reachable over their ORPort, which is intentional. Bridges that want to be distributed by BridgeDB need to expose their ORPort, which is an unfortunate technical limitation that makes the bridges more detectable (https://bugs.torproject.org/tpo/core/tor/7349), but for default bridges it's not necessary. To be honest, I'm not sure that `AssumeReachable` is even required anymore for this kind of configuration; it's just something I remember having to do years ago for some reason. It may be superfluous now that we have `BridgeDistribution none`.

Interesting... This shows my lack of knowledge regarding bridges as I have never run a bridge. Additionally, it highlights the major differences in running a Loadbalanced Tor Bridge vs a Loadbalanced Tor Relay and the necessity of using Transparent TCP Streaming Mode when the ORPort is exposed vs using Regular TCP Streaming Mode when the ORPort is not exposed. My Nginx Loadbalancer sits on the border of my network, listens on ORPort 9001, and uses Transparent TCP Streaming to loadbalance connections upstream to my Tor Relay Nodes.

>> Do your Snowflake instances not have issues reporting to different DirectoryAuthorities?

> Other than the possible metrics anomalies, I don't know what kind of issue you mean. It could be that, being a bridge, it has fewer constraints than your relays. A bridge doesn't have to be listed in the consensus, for example.

Yes... It's issues with consensus that I run into, if I don't configure my Tor Relay Nodes to send updates to a single DirectoryAuthority. This appears to be another major difference between running a Loadbalanced Tor Bridge vs a Loadbalanced Tor Relay.

>> With regard to loadbalanced Snowflake sessions, I'm curious to know what connections (i.e., inbound, outbound, directory, control, etc) are being displayed within nyx?

> I'm not using nyx. I'm just looking at the bandwidth on the network


If you have time, would you mind installing nyx to validate observed similarities/differences between our loadbalanced configurations?

>> Your Heartbeat logs continue to appear to be in good health. When keys are rotated,

> We're trying to avoid rotating keys at all. If the read-only files do not work, we'll instead probably periodically rewrite the state file to push the rotation into the future.

I'm especially interested in this topic. Please keep me updated!

>> > I worried a bit about the "0 with IPv6" in a previous comment. Looking at the bridge-stats files, I don't think there's a problem.

>> I'm glad to hear you feel the IPv6 reporting appears to be a false-negative. Does this mean there's something wrong with IPv6 Heartbeat reporting?

> I don't know if it's wrong, exactly. It's reporting something different than what ExtORPort is providing. The proximate connections to tor are indeed all IPv4.

I see. Perhaps IPv6 connections are less prolific and require more time to ramp?

>> Are your existing 8 cpu's only single cores? Is it too difficult to upgrade with your VPS provider?

> Sure, there are plenty of ways to increase resources of the bridge, but I feel that's a different topic.

After expanding my reading of your related "issues," I see that your VPS provider only offers up to 8 cores. Is it possible to spin-up another VPS environment, with the same provider, on a separate VLAN, allowing route/firewall access between the two VPS environments? This way you could test loadbalancing a Tor Bridge over a local network using multiple virtual environments. Perhaps, the Tor Project might even assist you with such a short-term investment (I read the meeting notes). ;-)

> Thanks for your comments.

Thank you for your responses.




This Message Originated by the Sun.

iBigBlue 63W Solar Array (~12 Hour Charge)

+ 2 x Charmast 26800mAh Power Banks

= iPhone XS Max 512GB (~2 Weeks Charged)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-relays/attachments/20220129/d07ddb3c/attachment-0001.htm>

More information about the tor-relays mailing list