[tor-relays] max TCP interruption before Tor circuit teardown?

Gordon Morehouse gordon at morehouse.me
Sun Oct 27 23:43:56 UTC 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi Roger, I was hoping you'd get to this eventually. :)

Roger Dingledine:
> On Sun, Oct 20, 2013 at 09:42:01AM -0700, Gordon Morehouse wrote:
>> With the slower computers, sometimes too many attempts to connect
>> to the ORPort (I am almost positive as part of TAP circuit
>> building, but not *really* sure) can eventually cause Tor to
>> consume more physmem than available and cause the oom-killer to
>> kill Tor.  Also, depending on the crappiness of the user's
>> router, it's effectively a SYN flood, and can crash or impair
>> consumer routers.
> 
> This doesn't sound like circuit building. It sounds like TLS
> handshakes.

Very good to know.

> You see, a new circuit handshake (TAP or NTor) is simply a 512-byte
> cell sent along an already established TCP connection. So if you're
> getting flooded by circuit handshakes, it will be traffic (which
> causes cpu load) but it won't be any new TCP connections.
> 
> If you're seeing a bunch of new TCP connections, that sounds like 
> clients trying to establish a new OR connection with you. (And
> those TLS handshakes are done in the core Tor thread, so having a
> weak CPU while handling a lot of TLS handshakes will cause your
> other Tor operations to hiccup.)

This is what's going on, and it's often relatively soon after I get my
Stable flag.

>> My solution, so far, is to define (through trial and error on a 
>> per-machine basis, since [1] is only officially supporting 3
>> SBCs right now) limits on how many SYNs may be sent to the ORPort
>> and the DirPort per second.  This is done with iptables.  I
>> experimented, tuned the parameters and watched traffic for weeks
>> and came up with a pretty good set of limits for a 950MHz
>> Raspberry Pi:  4 SYNs/sec burst 10.  (For those about to say the
>> Pi is thus too slow to be used as a relay, it's quite capable of
>> relaying *at least* 2.5Mbps, but *not* when it's getting SYN
>> flooded.)
> 
> My first question is to wonder if this flood of clients connections
> is coming from a few IP addresses or many IP addresses. And to
> wonder if it's coming from Tor relays or not.

I was lucky enough to catch a "storm" just starting a couple mornings
ago, and am going to try to dissect the logs and my realtime
observations and provide a report - I expect it'd be useful to more
than just me and my single-board computer project.

>> After watching the data, I noticed that some hosts just try to
>> connect once or twice, or try to connect (during overload
>> conditions) at reasonable intervals of tens of seconds to a few
>> minutes.  Other hosts will quadruple-tap the ORPort with SYNs,
>> four in a row, and otherwise be much more aggressive with sending
>> SYNs.
> 
> Sounds like you are seeing variations in TCP implementations.

Yep, that's what I figured.

>> Currently, if a peer violates the 4/sec burst 10 SYN limit more
>> than 5 times in 60 seconds, that peer will be banned for 90
>> seconds.  I'm trying to trim this down to the minimum that will
>> protect the relay, and 90 seconds is a guess given some of my
>> fears, read on...
> 
> That brings up a second question: if you *do* let them establish a 
> TLS connection with you, do they stop hammering you? Or do they
> always want more? How long until they hang up on a connection that
> you allow to establish.

I'm not entirely sure yet, and I need to do some log-data crunching.
Do you know offhand how long it will take Tor to give up on connecting
to a peer if it seems down for a while?

>> First, during a SYN flood type overload, some peers which have 
>> *existing* circuits built through the relay and are sending SYNs
>> as normal traffic, will stochastically get "caught" in the filter
>> and banned for a short time.
> 
> Wait, what? SYN packets are not part of normal traffic for an
> established connection.

I incorrectly assumed that new circuit requests began with a TCP
handshake.  However, *if* the peer were being flooded, and a peer that
was already connected to the relay happened to send 4 SYN packets
which arrived after other hosts had exceeded the limit for that given
second, the unlucky peer would still get banned.  David Serrano
suggested an amendment to my iptables rules, which I've implemented,
which *may* immunize ESTABLISHED connections from the fail2ban ban;
he's helping me piece out whether that actually works or not.

What would be good to know from you is how often already-connected
peers would be TCP handshaking to a relay's ORPort or DirPort.

>> So here's the $64,000 question:
>> 
>> If a tor relay has a circuit built through a peer, and the peer
>> starts dropping 100% of packets, how long will it take before the
>> relay with the circuit "gives up" on the circuit and tears it
>> down?
> 
> That depends on the TCP implementation on both sides. I imagine
> the answer varies widely. Which probably isn't what you wanted to
> hear.

Is there not a piece in Tor's connect-to-peer code which says "try for
N seconds, or P retries, then give up?"

Thanks much for your input.

- -Gordon M.


-----BEGIN PGP SIGNATURE-----

iQEcBAEBCgAGBQJSbaU5AAoJED/jpRoe7/ujdggH/0FE8VrYJI2WC8e1K7wgoYh9
sa6z34P1YF0lqsCuPbpt2cyeHFQbXy+0v/bEtBz6SMgSHGIJqVNKjPx1jlx9Ei8/
gpenIVLBs1urD40SqgXpm25hjlRVu6qztAON/LuwKh4jSHr+MYcAJGeKM8UX1pZE
cabnydUs/zmr9XoCIOQfVV5d4Sp3ofI2JzytvSYGjZYoNKWS6S7u3YRBu8Ab7Upo
+qWIS2TaI+2oRTp0EUV2ray1UsU/iFb8u98yn1k9P8XBzhiXy5uicOTYKSm1Lu1N
/0lvsnYFhJs0W5kJaY97QFju0sM642MqyLqQajtRL0aqS6jfJCSGsBZkzMhpjlI=
=1m37
-----END PGP SIGNATURE-----


More information about the tor-relays mailing list