Re: [tor-relays] max TCP interruption before Tor circuit teardown?

27 Oct 2013

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi Roger, I was hoping you'd get to this eventually. :)

Roger Dingledine:
...
On Sun, Oct 20, 2013 at 09:42:01AM -0700, Gordon Morehouse wrote:
...
With the slower computers, sometimes too many attempts to connect
to the ORPort (I am almost positive as part of TAP circuit
building, but not *really* sure) can eventually cause Tor to
consume more physmem than available and cause the oom-killer to
kill Tor.  Also, depending on the crappiness of the user's
router, it's effectively a SYN flood, and can crash or impair
consumer routers.
This doesn't sound like circuit building. It sounds like TLS
handshakes.
Very good to know.
...
You see, a new circuit handshake (TAP or NTor) is simply a 512-byte
cell sent along an already established TCP connection. So if you're
getting flooded by circuit handshakes, it will be traffic (which
causes cpu load) but it won't be any new TCP connections.
If you're seeing a bunch of new TCP connections, that sounds like 
clients trying to establish a new OR connection with you. (And
those TLS handshakes are done in the core Tor thread, so having a
weak CPU while handling a lot of TLS handshakes will cause your
other Tor operations to hiccup.)
This is what's going on, and it's often relatively soon after I get my
Stable flag.
...
...
My solution, so far, is to define (through trial and error on a 
per-machine basis, since [1] is only officially supporting 3
SBCs right now) limits on how many SYNs may be sent to the ORPort
and the DirPort per second.  This is done with iptables.  I
experimented, tuned the parameters and watched traffic for weeks
and came up with a pretty good set of limits for a 950MHz
Raspberry Pi:  4 SYNs/sec burst 10.  (For those about to say the
Pi is thus too slow to be used as a relay, it's quite capable of
relaying *at least* 2.5Mbps, but *not* when it's getting SYN
flooded.)
My first question is to wonder if this flood of clients connections
is coming from a few IP addresses or many IP addresses. And to
wonder if it's coming from Tor relays or not.
I was lucky enough to catch a "storm" just starting a couple mornings
ago, and am going to try to dissect the logs and my realtime
observations and provide a report - I expect it'd be useful to more
than just me and my single-board computer project.
...
...
After watching the data, I noticed that some hosts just try to
connect once or twice, or try to connect (during overload
conditions) at reasonable intervals of tens of seconds to a few
minutes.  Other hosts will quadruple-tap the ORPort with SYNs,
four in a row, and otherwise be much more aggressive with sending
SYNs.
Sounds like you are seeing variations in TCP implementations.
Yep, that's what I figured.
...
...
Currently, if a peer violates the 4/sec burst 10 SYN limit more
than 5 times in 60 seconds, that peer will be banned for 90
seconds.  I'm trying to trim this down to the minimum that will
protect the relay, and 90 seconds is a guess given some of my
fears, read on...
That brings up a second question: if you *do* let them establish a 
TLS connection with you, do they stop hammering you? Or do they
always want more? How long until they hang up on a connection that
you allow to establish.
I'm not entirely sure yet, and I need to do some log-data crunching.
Do you know offhand how long it will take Tor to give up on connecting
to a peer if it seems down for a while?
...
...
First, during a SYN flood type overload, some peers which have 
*existing* circuits built through the relay and are sending SYNs
as normal traffic, will stochastically get "caught" in the filter
and banned for a short time.
Wait, what? SYN packets are not part of normal traffic for an
established connection.
I incorrectly assumed that new circuit requests began with a TCP
handshake.  However, *if* the peer were being flooded, and a peer that
was already connected to the relay happened to send 4 SYN packets
which arrived after other hosts had exceeded the limit for that given
second, the unlucky peer would still get banned.  David Serrano
suggested an amendment to my iptables rules, which I've implemented,
which *may* immunize ESTABLISHED connections from the fail2ban ban;
he's helping me piece out whether that actually works or not.

What would be good to know from you is how often already-connected
peers would be TCP handshaking to a relay's ORPort or DirPort.
...
...
So here's the $64,000 question:
If a tor relay has a circuit built through a peer, and the peer
starts dropping 100% of packets, how long will it take before the
relay with the circuit "gives up" on the circuit and tears it
down?
That depends on the TCP implementation on both sides. I imagine
the answer varies widely. Which probably isn't what you wanted to
hear.
Is there not a piece in Tor's connect-to-peer code which says "try for
N seconds, or P retries, then give up?"

Thanks much for your input.

- -Gordon M.

-----BEGIN PGP SIGNATURE-----

iQEcBAEBCgAGBQJSbaU5AAoJED/jpRoe7/ujdggH/0FE8VrYJI2WC8e1K7wgoYh9
sa6z34P1YF0lqsCuPbpt2cyeHFQbXy+0v/bEtBz6SMgSHGIJqVNKjPx1jlx9Ei8/
gpenIVLBs1urD40SqgXpm25hjlRVu6qztAON/LuwKh4jSHr+MYcAJGeKM8UX1pZE
cabnydUs/zmr9XoCIOQfVV5d4Sp3ofI2JzytvSYGjZYoNKWS6S7u3YRBu8Ab7Upo
+qWIS2TaI+2oRTp0EUV2ray1UsU/iFb8u98yn1k9P8XBzhiXy5uicOTYKSm1Lu1N
/0lvsnYFhJs0W5kJaY97QFju0sM642MqyLqQajtRL0aqS6jfJCSGsBZkzMhpjlI=
=1m37
-----END PGP SIGNATURE-----