-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello everybody,
since this botnet started flooding Tor, my Tor relay Bazinga ($196832C61F30E9D6D179393C9AED4E47FD29796B) has been experiencing some issues.
Previously, it was relaying 100 Mbit/s for a few months without problem. When the botnet came along, at first the throughput dropped to about 30 Mbit/s and Tor kept spamming "Your CPU is too slow" and crashed every hour or so. I set RelayBandwidthRate to 20 Mbit/s which reduced the crash frequency.
Then, 0.2.4.17-rc was released and I lifted RelayBandwidthRate back to 100 Mbit/s. At first it was running fine, but then the crashes came back.
This is from yesterday and the day before:
$ sudo grep -i interrupt /var/log/tor/log.1 Sep 10 08:59:40.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 10:01:45.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 11:04:50.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 12:49:56.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 13:51:01.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 15:22:18.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 16:07:27.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 10 18:56:03.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now.
$ sudo zcat /var/log/tor/log.2.gz|grep -i interrupt Sep 09 06:51:20.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 07:54:30.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 08:48:36.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 09:51:56.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 10:20:01.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 10:49:06.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 13:03:26.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 13:26:40.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 13:50:47.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 14:02:54.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 14:26:06.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 14:42:11.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 14:52:17.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 15:08:31.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 15:21:37.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 15:31:27.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 15:44:32.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 15:53:38.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:02:45.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:08:50.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:16:56.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:24:04.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:34:10.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:48:20.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 16:55:25.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 17:02:30.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 17:21:39.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 17:35:45.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 18:51:51.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 19:14:02.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 19:30:10.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 19:50:16.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 20:06:23.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 20:49:52.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 21:13:00.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 21:51:06.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Sep 09 22:54:22.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now.
It is a VPS on a Xen hypervisor with 3x Intel(R) Xeon(R) CPU L5630 @ 2.13GHz and 2 GB RAM
According to htop, tor currently uses about 105% CPU and 350 MB RAM. It was around the same CPU and RAM consumption while it was happily relaying 100 Mbit/s.
At first I was suspecting the hoster to have reduced CPU resources or so, but they said they didn't change anything, including no virtualization parameters.
Any help would be appreciated.
- --RTNO
On 11.09.2013 10:05, Random Tor Node Operator wrote:
Sep 10 08:59:40.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now.
I'm just taking a wild guess here, because I had similar symptoms: that log message doesn't seem to be a crash but a regular shutdown. In my case I use 'monit' to monitor my server and the running services - including tor. Once in a while the automated TCP check on either the OR-Port or the DIR-Port failed which resulted in monit stopping and starting tor.
I reconfigured monit not to restart tor until at least three consecutive TCP checks failed. Now I still get the warnings for failed checks from time to time (maybe under high load?) but tor seems to run stable.
-Stephan
On Wed, Sep 11, 2013 at 12:34:12PM +0200, Stephan wrote:
On 11.09.2013 10:05, Random Tor Node Operator wrote:
Sep 10 08:59:40.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now.
I'm just taking a wild guess here, because I had similar symptoms: that log message doesn't seem to be a crash but a regular shutdown.
Correct. Something told your Tor to turn off.
In my case I use 'monit' to monitor my server and the running services - including tor. Once in a while the automated TCP check on either the OR-Port or the DIR-Port failed which resulted in monit stopping and starting tor.
Yow. That doesn't sound like what you wanted (nor does it sound good for the Tor network).
Thanks, --Roger
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 09/11/2013 12:37 PM, Roger Dingledine wrote:
In my case I use 'monit' to monitor my server and the running services - including tor. Once in a while the automated TCP check on either the OR-Port or the DIR-Port failed which resulted in monit stopping and starting tor.
Yow. That doesn't sound like what you wanted (nor does it sound good for the Tor network).
Do you only mean the unneccesary restarts or the way of checking for a running tor instance by checking the reachability of the ORPort?
If the latter, which (preferably monit-compatible) other method would you prefer?
- --RTNO
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 09/11/2013 12:34 PM, Stephan wrote:
I'm just taking a wild guess here, because I had similar symptoms: that log message doesn't seem to be a crash but a regular shutdown. In my case I use 'monit' to monitor my server and the running services - including tor. Once in a while the automated TCP check on either the OR-Port or the DIR-Port failed which resulted in monit stopping and starting tor.
I reconfigured monit not to restart tor until at least three consecutive TCP checks failed. Now I still get the warnings for failed checks from time to time (maybe under high load?) but tor seems to run stable.
Thanks, that is probably the cause of the restarts. Could you post the corresponding line(s) in your monitrc? Mine currently still looks like this:
if failed host 127.0.0.1 port 9001 then restart
- --RTNO
On 11.09.2013 13:33, Random Tor Node Operator wrote:
Could you post the corresponding line(s) in your monitrc?
Of course. I use the default of "set daemon 120", so tor is checked once every 120 seconds ('one cycle'). The tor specific part of the configuration is this:
if failed host 127.0.0.1 port 9030 type tcp for 10 cycles then restart if failed host 127.0.0.1 port 9001 type tcp for 10 cycles then restart if 5 restarts within 100 cycles then timeout
I'm still experimenting with the exact numbers. Restarting after only one failed check was obviously too fast. Waiting for 10 consecutive failed checks (i.e. tor is not responding for 20 minutes) may be too long, but I don't want to ruin the 'stable' flag only because a botnet is wreaking havoc for a few minutes.
The last line is kind of a safeguard. I get monit alerts by email. If something is seriously broken with my server (or tor) five mails will suffice. Even if I am on vacation for a few weeks, I don't need a reminder every 20 minutes. ;-)
-Stephan
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 09/11/2013 02:18 PM, Stephan wrote:
On 11.09.2013 13:33, Random Tor Node Operator wrote:
Could you post the corresponding line(s) in your monitrc?
Of course. I use the default of "set daemon 120", so tor is checked once every 120 seconds ('one cycle'). The tor specific part of the configuration is this:
if failed host 127.0.0.1 port 9030 type tcp for 10 cycles then restart if failed host 127.0.0.1 port 9001 type tcp for 10 cycles then restart if 5 restarts within 100 cycles then timeout
I'm still experimenting with the exact numbers. Restarting after only one failed check was obviously too fast. Waiting for 10 consecutive failed checks (i.e. tor is not responding for 20 minutes) may be too long, but I don't want to ruin the 'stable' flag only because a botnet is wreaking havoc for a few minutes.
The last line is kind of a safeguard. I get monit alerts by email. If something is seriously broken with my server (or tor) five mails will suffice. Even if I am on vacation for a few weeks, I don't need a reminder every 20 minutes. ;-)
Thanks. My cycle time is 60 seconds and I'm now trying out 5 cycles for restarting.
Bazinga's Stable flag is already gone due to those rogue restarts. It'll hopefully recover soon.
- --RTNO
tor-relays@lists.torproject.org