On Tue, Jan 25, 2022 at 11:21:10PM +0000, Gary C. New via tor-relays wrote:
It's nice to see that the Snowflake daemon offers a native configuration option for LimitNOFile. I ran into a similar issue with my initial loadbalanced Tor Relay Nodes that was solved at the O/S level using ulimit. It would be nice if torrc had a similar option.
LimitNOFile is actually not a Snowflake thing, it's a systemd thing. It's the same as `ulimit -n`. See: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Process%2...
From your documentation, it sounds like you're running everything on the same machine? When expanding to additional machines, similar to the file limit issue, you'll have to expand the usable ports as well.
I don't think I understand your point. At 64K simultaneous connections, you run out of source ports for making connection 4-tuple unique, but I don't see how the same or different hosts makes a difference, in that respect.
I found your HAProxy configuration in your “Draft installation guide.” It seems you’re using regular TCP streaming mode with the Snowflake instances vs transparent TCP streaming mode, which is a notable difference with the directly loadbalanced Tor Relay configuration.
I admit I did not understand your point about transparent proxying. If it's about retaining the client's source IP address for source IP address pinning, I don't think that helps us. This is a bridge, not a relay, and the source IP address that haproxy sees is several steps removed from the client's actual IP address. haproxy receives connections from a localhost web server (the server pluggable transport that receives WebSocket connections); the web server receives connections from Snowflake proxies (which can and do have different IP addresses during the lifetime of a client session); only the Snowflake proxies themselves receive direct traffic from the client's own source IP address. The client's IP address is tunnelled all the way through to tor, for metrics purposes, but that uses the ExtORPort protocol and the load balancer isn't going to understand that. I think that transparent proxying would only transparently proxy the localhost IP addresses from the web server, which doesn't have any benefit, I don't think.
What's written in the draft installation guide is not the whole file. There's additionally the default settings as follows:
``` global log /dev/log local0 log /dev/log local1 notice chroot /var/lib/haproxy stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners stats timeout 30s user haproxy group haproxy daemon
# Default SSL material locations ca-base /etc/ssl/certs crt-base /etc/ssl/private
# See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&... ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384 ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256 ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
defaults log global mode http option httplog option dontlognull timeout connect 5000 timeout client 50000 timeout server 50000 errorfile 400 /etc/haproxy/errors/400.http errorfile 403 /etc/haproxy/errors/403.http errorfile 408 /etc/haproxy/errors/408.http errorfile 500 /etc/haproxy/errors/500.http errorfile 502 /etc/haproxy/errors/502.http errorfile 503 /etc/haproxy/errors/503.http errorfile 504 /etc/haproxy/errors/504.http ```
You might test using a timeout value of 0s (to disable the timeout at the loadbalancer) and allow the Snowflake instances to preform state checking to ensure HAProxy isn’t throttling your bridge.
Thanks for that hint. So far, 10-minute timeouts seem not to be causing a problem. I don't know this software too well, but I think it's an idle timeout, not an absolute limit on connection lifetime.
Currently, as I only use IPv4, I can't offer much insight as to the lack of IPv6 connections being reported (that's what my logs report, too).
On further reflection, I don't think there's a problem here. The instances' bridge-stats and end-stats show a mix of countries and v4/v6. https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
Regarding metrics.torproject.org... I expect you'll see that written-bytes and read-bytes only reflect that of a single Snowflake instance. However, your consensus weight will reflect the aggregate of all Snowflake instances.
Indeed, the first few data points after the switchover show an apparent decrease in read/written bytes per second, even though the on-bridge bandwidth monitors show much more bandwidth being used than before. I suppose it could be selecting from any of 5 instances that currently share the same identity fingerprint: the 4 new load-balanced instances on the "staging" bridge, plus the 1 instance which is still running concurrently on the "production" bridge. When we finish the upgrade and get all the instances back on the production bridge, if the metrics are wrong, they will at least be uniformly wrong. https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB69... https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
Any reason why you chose HAProxy over Nginx?
Shelikhoo drafted a configuration using Nginx, which for the time being you can see here: https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla... https://pad.riseup.net/p/pvKoxaIcejfiIbvVAV7j#L416
I don't have a strong preference and I don't have a lot of experience with either one. haproxy seemed to offer fewer opportunities for error, because the default Nginx installation expects to run a web server, which I would have to disable and ensure it did not fight with snowflake-server for port 443. It just seemed simpler to have one configuration file to edit and restart the daemon.
I did notice that you’re using the AssumeReachable 1 directive in your torrc files. Are you running into an issue where your Tor instances are failing the reachability test?
It's because this bridge does not expose its ORPort, which is the recommended configuration for default bridges. The torrc has `ORPort 127.0.0.1:auto`, so the bridges will never be reachable over their ORPort, which is intentional. Bridges that want to be distributed by BridgeDB need to expose their ORPort, which is an unfortunate technical limitation that makes the bridges more detectable (https://bugs.torproject.org/tpo/core/tor/7349), but for default bridges it's not necessary. To be honest, I'm not sure that `AssumeReachable` is even required anymore for this kind of configuration; it's just something I remember having to do years ago for some reason. It may be superfluous now that we have `BridgeDistribution none`.
Do your Snowflake instances not have issues reporting to different DirectoryAuthorities?
Other than the possible metrics anomalies, I don't know what kind of issue you mean. It could be that, being a bridge, it has fewer constraints than your relays. A bridge doesn't have to be listed in the consensus, for example.