Tor crashes frequently on fast relay

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello everybody, since this botnet started flooding Tor, my Tor relay Bazinga ($196832C61F30E9D6D179393C9AED4E47FD29796B) has been experiencing some issues. Previously, it was relaying 100 Mbit/s for a few months without problem. When the botnet came along, at first the throughput dropped to about 30 Mbit/s and Tor kept spamming "Your CPU is too slow" and crashed every hour or so. I set RelayBandwidthRate to 20 Mbit/s which reduced the crash frequency. Then, 0.2.4.17-rc was released and I lifted RelayBandwidthRate back to 100 Mbit/s. At first it was running fine, but then the crashes came back. This is from yesterday and the day before: $ sudo grep -i interrupt /var/log/tor/log.1 Sep 10 08:59:40.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 10:01:45.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 11:04:50.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 12:49:56.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 13:51:01.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 15:22:18.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 16:07:27.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 18:56:03.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. $ sudo zcat /var/log/tor/log.2.gz|grep -i interrupt Sep 09 06:51:20.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 07:54:30.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 08:48:36.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 09:51:56.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 10:20:01.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 10:49:06.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 13:03:26.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 13:26:40.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 13:50:47.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 14:02:54.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 14:26:06.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 14:42:11.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 14:52:17.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 15:08:31.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 15:21:37.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 15:31:27.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 15:44:32.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 15:53:38.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:02:45.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:08:50.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:16:56.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:24:04.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:34:10.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:48:20.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:55:25.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 17:02:30.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 17:21:39.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 17:35:45.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 18:51:51.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 19:14:02.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 19:30:10.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 19:50:16.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 20:06:23.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 20:49:52.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 21:13:00.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 21:51:06.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 22:54:22.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. It is a VPS on a Xen hypervisor with 3x Intel(R) Xeon(R) CPU L5630 @ 2.13GHz and 2 GB RAM According to htop, tor currently uses about 105% CPU and 350 MB RAM. It was around the same CPU and RAM consumption while it was happily relaying 100 Mbit/s. At first I was suspecting the hoster to have reduced CPU resources or so, but they said they didn't change anything, including no virtualization parameters. Any help would be appreciated. - --RTNO -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSMCRbAAoJEJe61A/xrcOQmfcP/Rq2XmtQk8e4orMWe1rPHnKX 6CMMmrI5TIAW/XQ+Qulr2ElEKh0M85XSagF8ZeziQVL348z3FoI36HwUMbzLnkI3 uELx8kq1xTGlLOmtSEdyD9xH2dwCxarlogexVb3uHUVEUQM99GMQd/A4bSzLPgTW 6fnwvDWNNZ4x1PKXxElejGhoHiq2oW5f5JoSijw4LUWJBI2SrXx4aWFHm4SofVti Cncaf+UVM2f4loBn9gUh5LWmbeWviDXsk9p5DeNAAaJ8BwK8dp4MyAoH731dcbpw svJ2b1R0TtmgyO6g/WLJcJX3s9u6iKq2QzQgLavynsAaazn6Rr4nRqPD02TFM7rw nyP9H6r08P/7gumOF3G4JSXe3Px6GMipdcGNl0fBjr6R67sB7ZeMwT00O53m0f5l InUbCCghG/ukcEtbly0CZfu7qOISlz9cxUFShmYWOLv3EpMboh9lyMNC3DbyAYH2 3kMfffyVLS21HgNA57l0N+aZl++jz+EIwFK8r+L9xZEOUBQKcXrfm0jXneV6j8uA J9NuXYyKHWltxU+EO2ccyN2Zco/3KA044dwSnTo1BXCHnt95DjyylLcQj5+7+KYW QIsviYsONp4FSz0aQUFOnLk5KAhZtieQ28Iav1S4Gb3QiL4pY2rR9r27dBZ/E1wg +C9uvwcJz4ei14ZR9A19 =BoPn -----END PGP SIGNATURE-----

On 11.09.2013 10:05, Random Tor Node Operator wrote:
Sep 10 08:59:40.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now.
I'm just taking a wild guess here, because I had similar symptoms: that log message doesn't seem to be a crash but a regular shutdown. In my case I use 'monit' to monitor my server and the running services - including tor. Once in a while the automated TCP check on either the OR-Port or the DIR-Port failed which resulted in monit stopping and starting tor. I reconfigured monit not to restart tor until at least three consecutive TCP checks failed. Now I still get the warnings for failed checks from time to time (maybe under high load?) but tor seems to run stable. -Stephan

On Wed, Sep 11, 2013 at 12:34:12PM +0200, Stephan wrote:
On 11.09.2013 10:05, Random Tor Node Operator wrote:
Sep 10 08:59:40.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now.
I'm just taking a wild guess here, because I had similar symptoms: that log message doesn't seem to be a crash but a regular shutdown.
Correct. Something told your Tor to turn off.
In my case I use 'monit' to monitor my server and the running services - including tor. Once in a while the automated TCP check on either the OR-Port or the DIR-Port failed which resulted in monit stopping and starting tor.
Yow. That doesn't sound like what you wanted (nor does it sound good for the Tor network). Thanks, --Roger

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/11/2013 12:37 PM, Roger Dingledine wrote:
In my case I use 'monit' to monitor my server and the running services - including tor. Once in a while the automated TCP check on either the OR-Port or the DIR-Port failed which resulted in monit stopping and starting tor.
Yow. That doesn't sound like what you wanted (nor does it sound good for the Tor network).
Do you only mean the unneccesary restarts or the way of checking for a running tor instance by checking the reachability of the ORPort? If the latter, which (preferably monit-compatible) other method would you prefer? - --RTNO -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSMFaAAAoJEJe61A/xrcOQWtIP/0AXixNfgsBJfUNwpk7Bl3ew CnInGn+Ad28AXpSCixMMa/1e+6h3+AH0dVv64HLeqVxeNWelKD4icu2+r/RUzrYW VNJTN60tJFFZOrvZXVWsXsDSq7v+dPn2kxDGlcPVclvte9TueHpAfou/k0757jOe 4zK1SAnkFK1ybebubvNtDghsUeRSbtZB6dsQgpa3qV5SfqXWkJ/uAsQvoGhplYcx 9ouzoGe5mil+Zxlt3e2kTJv9Nj0mCfFPtnWCTV5Df4xZ6UUvT9bSlV1dfKV0Rv1f MGPHr4xQ2hAaW1NmvBUuzVZ9LIPHcrf+0sOXbO9I9g4GwbgHHZKd/4b6fH+xnI35 PL+O6j4KErSQnX4gdXZa4I6B3L9Esa/cmejxqeddVdqs7L9GU1HvUcM/XqSXyWhx TD9L4nfgNy1ZtP0xo6BK3JgwBlHiJqywenacFlVDzl0fCHSicMOgAyznW/Fb7ytC 5E+5sR1XSvdlTx3yVEGZQoEQwDqK7PPm8tKZZCB67lJK4Xj0Bfy51R0AQx9CbTGm cUKbl7gd6L8yLayXCWWKkEHFpwh2klmPEaS4eG7vSielGWPLzxxWjig0sI4AvOfT iFFmoxt6AjQgtuvhZoHtD/1kI5VVnQlQeLMxogCl7DyFrrXrQKdZ1SUKsovGLKhf LvKVgwAMVzrpuFcVSw0K =cdxC -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/11/2013 12:34 PM, Stephan wrote:
I'm just taking a wild guess here, because I had similar symptoms: that log message doesn't seem to be a crash but a regular shutdown. In my case I use 'monit' to monitor my server and the running services - including tor. Once in a while the automated TCP check on either the OR-Port or the DIR-Port failed which resulted in monit stopping and starting tor.
I reconfigured monit not to restart tor until at least three consecutive TCP checks failed. Now I still get the warnings for failed checks from time to time (maybe under high load?) but tor seems to run stable.
Thanks, that is probably the cause of the restarts. Could you post the corresponding line(s) in your monitrc? Mine currently still looks like this: if failed host 127.0.0.1 port 9001 then restart - --RTNO -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSMFUTAAoJEJe61A/xrcOQ9XMQAJx5kAn1+y2md6+aQOysmZF5 SiQPw0ALKyRIBKTticIXmjTVtKdU9tD71m3OrGeZTVcipl7jZtjOsmWtE35I3yJW c4vsCY/8Ou40w5GlAfuLNJvPBr2NyFP8yYkLFTcDRELXgA9MZoSoxHOgXz1Qb1Um zTH5x1h4wYfq5q6TfATeeA9ZWtsavOocB+MEeZgp5H9sB5fr+B2efPq9QU2e8mvt vnqvKRx8Xx0IThYZIjAKylzEIFOaVOQM9WCZBLtWpyluvMrfW3A8RAvnED+RyG94 5Y0bvXx1IDSiKKGWdUApAEcCMxRYrbVFlufvCs49KU5HtpmGnnJQa8WuBugWE5Dr 6EdPrlSEPm++0TQX7r/gZH4AEccTuPQM7+8sIcznUozvzuWEcWlBVN/EA6BAJIak 6uue/7V31iwYN5a5tswcIF3ulZFwv11yeJ2iIKWZGwO8OoeZ6VQ6yZNGl429KTMI +GaYW/3QdG0OT6SoRMZo6/rtKWLMzI0l9KSxu2/l5/8sIkpAz+sSIBRX6GG4WmsI x+s/Fb414p72QiQyfk/1/HV3EEFyzesLfWktM900laYNmDV4GoTDezQBijL5ORii Pv6Au8peIszQ6bRf4Xr3TbOpEeJ9nbVvji0NdTAyEWN7EBhApFI2iBg/GYA1l8kp 7Gj6ntCYHjJaYmfSNo68 =tgZ9 -----END PGP SIGNATURE-----

On 11.09.2013 13:33, Random Tor Node Operator wrote:
Could you post the corresponding line(s) in your monitrc?
Of course. I use the default of "set daemon 120", so tor is checked once every 120 seconds ('one cycle'). The tor specific part of the configuration is this: if failed host 127.0.0.1 port 9030 type tcp for 10 cycles then restart if failed host 127.0.0.1 port 9001 type tcp for 10 cycles then restart if 5 restarts within 100 cycles then timeout I'm still experimenting with the exact numbers. Restarting after only one failed check was obviously too fast. Waiting for 10 consecutive failed checks (i.e. tor is not responding for 20 minutes) may be too long, but I don't want to ruin the 'stable' flag only because a botnet is wreaking havoc for a few minutes. The last line is kind of a safeguard. I get monit alerts by email. If something is seriously broken with my server (or tor) five mails will suffice. Even if I am on vacation for a few weeks, I don't need a reminder every 20 minutes. ;-) -Stephan

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/11/2013 02:18 PM, Stephan wrote:
On 11.09.2013 13:33, Random Tor Node Operator wrote:
Could you post the corresponding line(s) in your monitrc?
Of course. I use the default of "set daemon 120", so tor is checked once every 120 seconds ('one cycle'). The tor specific part of the configuration is this:
if failed host 127.0.0.1 port 9030 type tcp for 10 cycles then restart if failed host 127.0.0.1 port 9001 type tcp for 10 cycles then restart if 5 restarts within 100 cycles then timeout
I'm still experimenting with the exact numbers. Restarting after only one failed check was obviously too fast. Waiting for 10 consecutive failed checks (i.e. tor is not responding for 20 minutes) may be too long, but I don't want to ruin the 'stable' flag only because a botnet is wreaking havoc for a few minutes.
The last line is kind of a safeguard. I get monit alerts by email. If something is seriously broken with my server (or tor) five mails will suffice. Even if I am on vacation for a few weeks, I don't need a reminder every 20 minutes. ;-)
Thanks. My cycle time is 60 seconds and I'm now trying out 5 cycles for restarting. Bazinga's Stable flag is already gone due to those rogue restarts. It'll hopefully recover soon. - --RTNO -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSMIK6AAoJEJe61A/xrcOQa9oQAKu7qNF96HgcAjvYHSI5tkwb CX5U/hvxaXQj1wvYDKcoodVBL8rTmUNzeblI0pGIsAZ57znwPOII05OwspUJznor kDdmZbihtqwENwQ/Z02iGH2mLu9HMpcm8el5X8gTtSHyWM3vU2UrvVUxV2VJBsal U/TUGILNLv/aETGslQxcXM8d1h5IdOsbGts61VpjbITEqfTepXF3/zzrp0ZaxjD6 AXyLAdIs2uZRo1AtrYDwfOw2yb11qEpKnXfd6foB1+9EVLoVuJyJnR95IYipY8Wa krFHdY32IkOljZhtVy9nIS6DUQP4Ld57FqvNHJ2sHIUNc3vUmtY88pDgjaFqHln/ 5qdogx+3IiRsGYQpg/xQmNl0y+elKfw+3YRKkSWE0eI2htTBdrqw/FShRFeH83YI e7tmJZrj0WEcVM+yl4RcsXPvLC0wY64JFo1tqTICvqmFwHxqyikw6+FUKg0knygU /O14i0BC6o3dl8VzN80dlLZrbeDd/MFDKw/rO8UbC1GfufaklQ/i+qvArgpwwqz0 6H9NeEzmz8xrS5jVdLu3TAH8IAcDl93xlPzjDWpoTwbpLOK/B8/hFvph/7xTlPfy fOd7MlzStPyrbQftWg3LYU9svQphMbC1dGZFGm9gcuiYAB68KAz42ymWIPu6JWoy mxt725/8dzu/bv8b6Kek =Xy7A -----END PGP SIGNATURE-----
participants (3)
-
Random Tor Node Operator
-
Roger Dingledine
-
Stephan