On Sun, Oct 20, 2013 at 09:42:01AM -0700, Gordon Morehouse wrote:
With the slower computers, sometimes too many attempts to connect to the ORPort (I am almost positive as part of TAP circuit building, but not *really* sure) can eventually cause Tor to consume more physmem than available and cause the oom-killer to kill Tor. Also, depending on the crappiness of the user's router, it's effectively a SYN flood, and can crash or impair consumer routers.
This doesn't sound like circuit building. It sounds like TLS handshakes.
You see, a new circuit handshake (TAP or NTor) is simply a 512-byte cell sent along an already established TCP connection. So if you're getting flooded by circuit handshakes, it will be traffic (which causes cpu load) but it won't be any new TCP connections.
If you're seeing a bunch of new TCP connections, that sounds like clients trying to establish a new OR connection with you. (And those TLS handshakes are done in the core Tor thread, so having a weak CPU while handling a lot of TLS handshakes will cause your other Tor operations to hiccup.)
My solution, so far, is to define (through trial and error on a per-machine basis, since [1] is only officially supporting 3 SBCs right now) limits on how many SYNs may be sent to the ORPort and the DirPort per second. This is done with iptables. I experimented, tuned the parameters and watched traffic for weeks and came up with a pretty good set of limits for a 950MHz Raspberry Pi: 4 SYNs/sec burst 10. (For those about to say the Pi is thus too slow to be used as a relay, it's quite capable of relaying *at least* 2.5Mbps, but *not* when it's getting SYN flooded.)
My first question is to wonder if this flood of clients connections is coming from a few IP addresses or many IP addresses. And to wonder if it's coming from Tor relays or not.
After watching the data, I noticed that some hosts just try to connect once or twice, or try to connect (during overload conditions) at reasonable intervals of tens of seconds to a few minutes. Other hosts will quadruple-tap the ORPort with SYNs, four in a row, and otherwise be much more aggressive with sending SYNs.
Sounds like you are seeing variations in TCP implementations.
Currently, if a peer violates the 4/sec burst 10 SYN limit more than 5 times in 60 seconds, that peer will be banned for 90 seconds. I'm trying to trim this down to the minimum that will protect the relay, and 90 seconds is a guess given some of my fears, read on...
That brings up a second question: if you *do* let them establish a TLS connection with you, do they stop hammering you? Or do they always want more? How long until they hang up on a connection that you allow to establish.
First, during a SYN flood type overload, some peers which have *existing* circuits built through the relay and are sending SYNs as normal traffic, will stochastically get "caught" in the filter and banned for a short time.
Wait, what? SYN packets are not part of normal traffic for an established connection.
So here's the $64,000 question:
If a tor relay has a circuit built through a peer, and the peer starts dropping 100% of packets, how long will it take before the relay with the circuit "gives up" on the circuit and tears it down?
That depends on the TCP implementation on both sides. I imagine the answer varies widely. Which probably isn't what you wanted to hear.
--Roger