[tor-relays] BadMiddle Nodes Loadbalancing Re: How to reduce tor CPU load on a single bridge?

Jonas Friedli jonasfriedli at danwin1210.de
Sat Mar 26 15:39:52 UTC 2022


Hi Gary,

wait, your Relays are BadMiddle Nodes! It breaks at least all onion 
features.
Thanks for trying to saturate the connection by balancing the 
single-core tor bottleneck.

But think of any client try to establish RendezvousPoints or 
IntroductionPoints or fetch HsDir with one of your tor instances that 
all have same descriptor and look same to outside.

It will result into fail the connection!

1.
a_tor_client => HAProxy server tornode41 192.168.0.41:9001 => 
RendezvousCookie 46B9FA5B
onionservice => HAProxy server tornode51 192.168.0.41:9001 => 
RendezvousCookie 46B9FA5B

2.
tornode51 does not know cookies of tornode41 will:
[WARN] Rejecting RENDEZVOUS1 cell with unrecognized rendezvous cookie 
46B9FA5B.
Didn't recognize cell, but circ stops here
circuit_receive_relay_cell (forward) failed


This Loadbalancing setup is only valid and usefull for Bridges or 
ExitNode relay roles. Not for any that have MiddlePropability 
consensusweigths.

just FYI


 >  David, Roger, et al.,
 >
 > I just got back from holidays and really enjoyed this thread!
 >
 > I run my Loadbalanced Tor Relay as a Guard/Middle Relay, very similar 
to David's topology diagram, without the Snoflake-Server proxy. I'm 
using Nginx (which forks a child process per core) instead of HAProxy. 
My Backend Tor Relay Nodes are running on several, different Physical 
Servers; thus, I'm using Private Address Space instead of Loopback 
Address Space.
 >
 > In this configuration, I discovered that I had to configure 
Nginix/HAProxy to use Transparent Streaming Mode, use Source IP Address 
Sticky Sessions (Pinning), configure the Loadbalancer to send the 
Backend Tor Relay Nodes' traffic back to Nginx/HAProxy (Kernel & 
IPTables), configure all Backend Tor Relay Nodes to use a copy of the 
same .tordb (I wasn't able to get the Backend Tor Relay Nodes working 
with the same .tordb (over NFS) without the DirectoryAuthorities 
complaining), and configure the Backend Tor Relay Nodes to use the same 
DirectoryAuthority (to ensure each Backend Tor Relay Node sends 
Meta-Data to the same DirectoryAuthority). Moreover, I've enabled 
logging to a central Syslog Server for each Backend Tor Relay Node and 
created a number of Shell Scripts to help remotely manage each Backend 
Tor Relay Node.
 >
 > Here are some sample configurations for reference.
 >
 > Nginx Config:
 >
 > upstream orport_tornodes {
 > #least_conn;
 > hash $remote_addr consistent;
 > #server 192.168.0.1:9001 weight=1 max_fails=1 fail_timeout=10s;
 > #server 192.168.0.1:9001 down;
 > server 192.168.0.11:9001 weight=4 max_fails=0 fail_timeout=0s;
 > server 192.168.0.21:9001 weight=4 max_fails=0 fail_timeout=0s;
 > #server 192.168.0.31:9001 weight=4 max_fails=3 fail_timeout=300s;
 > server 192.168.0.41:9001 weight=4 max_fails=0 fail_timeout=0s;
 > server 192.168.0.51:9001 weight=4 max_fails=0 fail_timeout=0s;
 > #zone orport_torfarm 64k;
 >
 >
 > HAProxy Config (Alternate):
 >
 > frontend tornodes
 > # Log to global config
 > log global
 >
 > # Bind to port 443 on a specified interface
 > bind 0.0.0.0:9001 transparent
 >
 > # We're proxying TCP here...
 > mode tcp
 >
 > default_backend orport_tornodes
 >
 > # Simple TCP source consistent over several servers using the specified
 > # source 0.0.0.0 usesrc clientip
 > backend orport_tornodes
 >
 > balance source
 > hash-type consistent
 > #server tornode1 192.168.0.1:9001 check disabled
 > #server tornode11 192.168.0.11:9001 source 192.168.0.1
 > server tornode11 192.168.0.11:9001 source 0.0.0.0 usesrc clientip 
check disabled
 > server tornode21 192.168.0.21:9001 source 0.0.0.0 usesrc clientip 
check disabled
 > #server tornode31 192.168.0.31:9001 source 0.0.0.0 usesrc clientip 
check disabled
 > server tornode41 192.168.0.41:9001 source 0.0.0.0 usesrc clientip 
check disabled
 > server tornode51 192.168.0.51:9001 source 0.0.0.0 usesrc clientip 
check disabled
 >
 >
 > Linux Kernel & IPTables Config:
 >
 > modprobe xt_socket
 > modprobe xt_TPROXY
 >
 > echo 1 > /proc/sys/net/ipv4/ip_forward; cat /proc/sys/net/ipv4/ip_forward
 > echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind; cat 
/proc/sys/net/ipv4/ip_nonlocal_bind
 > echo 15000 64000 > /proc/sys/net/ipv4/ip_local_port_range; cat 
/proc/sys/net/ipv4/ip_local_port_range
 >
 > ip rule del fwmark 1 lookup 100 2>/dev/null # Ensure Duplicate Rule 
is not Created
 > ip rule add fwmark 1 lookup 100 # ip rule show
 > ip route add local 0.0.0.0/0 dev lo table 100 # ip route show table 
wan0; ip route show table 100
 >
 > iptables -I INPUT -p tcp --dport 9001 -j ACCEPT
 > iptables -t mangle -N TOR
 > iptables -t mangle -A PREROUTING -p tcp -m socket -j TOR
 > iptables -t mangle -A TOR -j MARK --set-mark 1
 > iptables -t mangle -A TOR -j ACCEPT
 > #iptables -t mangle -A PREROUTING -p tcp -s 192.168.0.0/24 --sport 
9001 -j MARK --set-xmark 0x1/0xffffffff
 > #iptables -t mangle -A PREROUTING -p tcp --dport 9001 -j TPROXY 
--tproxy-mark 0x1/0x1 --on-port 9001 --on-ip 127.0.0.1
 >
 >
 > Backend Tor Relay Node Configs:
 >
 > # cat /tmp/torrc
 > Nickname xxxxxxxxxxxxxxxxxx
 > ORPort xxx.xxx.xxx.xxx:9001 NoListen
 > ORPort 192.168.0.11:9001 NoAdvertise
 > SocksPort 9050
 > SocksPort 192.168.0.11:9050
 > ControlPort 9051
 > DirAuthority longclaw orport=443 no-v2 
v3ident=23D15D965BC35114467363C165C4F724B64B4F66 199.58.81.140:80 
74A910646BCEEFBCD2E874FC1DC997430F968145
 > FallbackDir 193.23.244.244:80 orport=443 
id=7BE683E65D48141321C5ED92F075C55364AC7123
 > DirCache 0
 > ExitRelay 0
 > MaxMemInQueues 192 MB
 > GeoIPFile /opt/share/tor/geoip
 > Log notice file /tmp/torlog
 > Log notice syslog
 > VirtualAddrNetwork 10.192.0.0/10
 > AutomapHostsOnResolve 1
 > TransPort 192.168.0.11:9040
 > DNSPort 192.168.0.11:9053
 > RunAsDaemon 1
 > DataDirectory /tmp/tor/torrc.d/.tordb
 > AvoidDiskWrites 1
 > User tor
 > ContactInfo tor-operator at your-emailaddress-domain
 >
 > # cat /tmp/torrc
 > Nickname xxxxxxxxxxxxxxxxxx
 > ORPort xxx.xxx.xxx.xxx:9001 NoListen
 > ORPort 192.168.0.41:9001 NoAdvertise
 > SocksPort 9050
 > SocksPort 192.168.0.41:9050
 > ControlPort 9051
 > DirAuthority longclaw orport=443 no-v2 
v3ident=23D15D965BC35114467363C165C4F724B64B4F66 199.58.81.140:80 
74A910646BCEEFBCD2E874FC1DC997430F968145
 > FallbackDir 193.23.244.244:80 orport=443 
id=7BE683E65D48141321C5ED92F075C55364AC7123
 > DirCache 0
 > ExitRelay 0
 > MaxMemInQueues 192 MB
 > GeoIPFile /opt/share/tor/geoip
 > Log notice file /tmp/torlog
 > Log notice syslog
 > VirtualAddrNetwork 10.192.0.0/10
 > AutomapHostsOnResolve 1
 > TransPort 192.168.0.41:9040
 > DNSPort 192.168.0.41:9053
 > RunAsDaemon 1
 > DataDirectory /tmp/tor/torrc.d/.tordb
 > AvoidDiskWrites 1
 > User tor
 > ContactInfo tor-operator at your-emailaddress-domain
 >
 >
 > Shell Scripts to Remotely Manage Tor Relay Nodes:
 >
 > # cat /usr/sbin/stat-tor-nodes
 > #!/bin/sh
 > uptime-all-nodes; memfree-all-nodes; netstat-tor-nodes
 >
 > # cat /usr/sbin/uptime-all-nodes
 > #!/bin/sh
 > /usr/bin/ssh -t admin at 192.168.0.11 'hostname; uptime'
 > /usr/bin/ssh -t admin at 192.168.0.21 'hostname; uptime'
 > /usr/bin/ssh -t admin at 192.168.0.31 'hostname; uptime'
 > /usr/bin/ssh -t admin at 192.168.0.41 'hostname; uptime'
 > /usr/bin/ssh -t admin at 192.168.0.51 'hostname; uptime'
 >
 > # cat /usr/sbin/memfree-all-nodes
 > #!/bin/sh
 > /usr/bin/ssh -t admin at 192.168.0.11 'hostname; grep MemFree 
/proc/meminfo'
 > /usr/bin/ssh -t admin at 192.168.0.21 'hostname; grep MemFree 
/proc/meminfo'
 > /usr/bin/ssh -t admin at 192.168.0.31 'hostname; grep MemFree 
/proc/meminfo'
 > /usr/bin/ssh -t admin at 192.168.0.41 'hostname; grep MemFree 
/proc/meminfo'
 > /usr/bin/ssh -t admin at 192.168.0.51 'hostname; grep MemFree 
/proc/meminfo'
 >
 > # cat /usr/sbin/netstat-tor-nodes
 > #!/bin/sh
 > /usr/bin/ssh -t admin at 192.168.0.11 'hostname; netstat -anp | grep 
-i tor | grep -v 192.168.0.1: | wc -l'
 > /usr/bin/ssh -t admin at 192.168.0.21 'hostname; netstat -anp | grep 
-i tor | grep -v 192.168.0.1: | wc -l'
 > /usr/bin/ssh -t admin at 192.168.0.31 'hostname; netstat -anp | grep 
-i tor | grep -v 192.168.0.1: | wc -l'
 > /usr/bin/ssh -t admin at 192.168.0.41 'hostname; netstat -anp | grep 
-i tor | grep -v 192.168.0.1: | wc -l'
 > /usr/bin/ssh -t admin at 192.168.0.51 'hostname; netstat -anp | grep 
-i tor | grep -v 192.168.0.1: | wc -l'
 >
 > # cat /jffs/sbin/ps-tor-nodes
 > #!/bin/sh
 > /usr/bin/ssh -t admin at 192.168.0.11 'hostname; ps w | grep -i tor'
 > /usr/bin/ssh -t admin at 192.168.0.21 'hostname; ps w | grep -i tor'
 > /usr/bin/ssh -t admin at 192.168.0.31 'hostname; ps w | grep -i tor'
 > /usr/bin/ssh -t admin at 192.168.0.41 'hostname; ps w | grep -i tor'
 > /usr/bin/ssh -t admin at 192.168.0.51 'hostname; ps w | grep -i tor'
 >
 > # cat /usr/sbin/killall-tor-nodes
 > #!/bin/sh
 > read -r -p "Are you sure? [y/N] " input
 > case "$input" in
 > [yY])
 > /usr/bin/ssh -t admin at 192.168.0.11 'killall tor'
 > /usr/bin/ssh -t admin at 192.168.0.21 'killall tor'
 > #/usr/bin/ssh -t admin at 192.168.0.31 'killall tor'
 > /usr/bin/ssh -t admin at 192.168.0.41 'killall tor'
 > /usr/bin/ssh -t admin at 192.168.0.51 'killall tor'
 > return 0
 > ;;
 > *)
 > return 1
 > ;;
 > esac
 >
 > # cat /usr/sbin/restart-tor-nodes
 > #!/bin/sh
 > read -r -p "Are you sure? [y/N] " input
 > case "$input" in
 > [yY])
 > /usr/bin/ssh -t admin at 192.168.0.11 '/usr/sbin/tor -f /tmp/torrc 
--quiet'
 > /usr/bin/ssh -t admin at 192.168.0.21 '/usr/sbin/tor -f /tmp/torrc 
--quiet'
 > #/usr/bin/ssh -t admin at 192.168.0.31 '/usr/sbin/tor -f /tmp/torrc 
--quiet'
 > /usr/bin/ssh -t admin at 192.168.0.41 '/usr/sbin/tor -f /tmp/torrc 
--quiet'
 > /usr/bin/ssh -t admin at 192.168.0.51 '/usr/sbin/tor -f /tmp/torrc 
--quiet'
 > return 0
 > ;;
 > *)
 > return 1
 > ;;
 > esac
 >
 > I've been meaning to put together a tutorial on Loadbalancing Tor 
Relays, but haven't found the time as of yet. Perhaps, this will help, 
until I am able to find the time.
 >
 > I appreciate your knowledge sharing and for furthering the topic of 
Loadbalancing Tor Relays; especially, with regard to Bridging and Exit 
Relays.
 >
 > Keep up the Great Work!
 >
 > Respectfully,
 >
 >
 > Gary
 >      On Tuesday, January 4, 2022, 09:57:52 PM MST, Roger Dingledine 
<arma at torproject.org> wrote:
 >
 >  [I'm about to go off-line for some days, so I am sending my current
 > suboptimally-organized reply, which I hope is better than waiting another
 > week to respond :)]
 >
 > On Thu, Dec 30, 2021 at 10:42:51PM -0700, David Fifield wrote:
 > > Let's make a distinction between the "frontend" snowflake-server
 > > pluggable transport process, and the "backend" tor process. These don't
 > > necessarily have to be 1:1; either one could be run in multiple
 > > instances. Currently, the "backend" tor is the limiting factor, because
 > > it uses only 1 CPU core. The "frontend" snowflake-server can scale to
 > > multiple cores in a single process and is comparatively unrestrained.
 >
 > Excellent point, and yes this simplifies. Great.
 >
 > > I believe that the "pinning" of a client session to particular tor
 > > instance will work automatically by the fact that snowflake-server 
keeps
 > > an outgoing connection alive (i.e., through the load balancer) as long
 > > as a KCP session exists.
 > >[...]
 > > But before starting the second instance the first time, copy keys from
 > > the first instance:
 >
 > Hm. It looks promising! But we might still have a Tor-side problem
 > remaining. I think it boils down to how long the KCP sessions last.
 >
 > The details on how exactly these bridge instances will diverge over time:
 >
 > The keys directory will start out the same, but after four weeks
 > (DEFAULT_ONION_KEY_LIFETIME_DAYS, used to be one week but in Tor
 > 0.3.1.1-alpha, proposal 274, we bumped it up to four weeks) each
 > bridge will rotate its onion key (the one clients use for circuit-level
 > crypto). That is, each instance will generate its own fresh onion key.
 >
 > The two bridge instances actually haven't diverged completely at that
 > point, since Tor remembers the previous onion key (i.e. the onion key
 > from the previous period) and is willing to receive create cells that
 > use it for one further week (DEFAULT_ONION_KEY_GRACE_PERIOD_DAYS). So it
 > is after 5 weeks that the original (shared) onion key will no longer 
work.
 >
 > Where this matters is (after this 5 weeks have passed) if the client
 > connects to the bridge, fetches and caches the bridge descriptor of
 > instance A, and then later it connects to the bridge again and gets
 > passed to instance B. In this case, the create cell that the client
 > generates will use the onion key for instance A, and instance B won't
 > know how to decrypt it so it will send a destroy cell back.
 >
 > If this is an issue, we can definitely work around it, by e.g. disabling
 > the onion key rotation on the bridges, or setting up a periodic rsync+hup
 > between the bridges, or teaching clients to use createfast cells in this
 > situation (this type of circuit crypto doesn't use the onion key at all,
 > and just relies on TLS for security -- which can only be done for the
 > first hop of the circuit but that's the one we're talking about here).
 >
 > But before we think about workarounds, maybe we don't need one: how long
 > does "the KCP session" last?
 >
 > Tor clients try to fetch a fresh bridge descriptor every three-ish
 > hours, and once they fetch a bridge descriptor from their "current"
 > bridge instance, they should know the onion key that it wants to use. So
 > it is that up-to-three-hour window where I think things could go wrong.
 > And that timeframe sounds promising.
 >
 > (I also want to double-check that clients don't try to use the onion
 > key from the current cached descriptor while fetching the updated
 > descriptor. That could become an ugly bug in the wrong circumstances,
 > and would be something we want to fix if it's happening.)
 >
 > Here's how you can simulate a pair of bridge instances that have diverged
 > after five weeks, so you can test how things would work with them:
 >
 > Copy the keys directory as before, but "rm secret_onion_key*" in the
 > keys directory on n-1 of the instances, before starting them.)
 >
 > Thanks!
 > --Roger
 >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0x4A148E3AB438EC68.asc
Type: application/pgp-keys
Size: 663 bytes
Desc: OpenPGP public key
URL: <http://lists.torproject.org/pipermail/tor-relays/attachments/20220326/f9ea6c82/attachment.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jonasfriedli.vcf
Type: text/vcard
Size: 309 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-relays/attachments/20220326/f9ea6c82/attachment.vcf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 236 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-relays/attachments/20220326/f9ea6c82/attachment.sig>


More information about the tor-relays mailing list