I've been running a relay for about months now. It runs on an 1.6 Ghz single core Atom with hyperthreading, 1GB of RAM. It's on my home connection; I advertise only 176 KB/sec.
Under normal conditions, it pumps around 100KB/sec, it has around 600 connections, it uses around 20% CPU. Everything looks very healthy.
Lately I have seen a couple of incidents where the number of connections suddenly goes up to over 3000, traffic increases heavily, CPU usage goes well over 150% (out of possible 200). Traffic can go up to between 500 and 1000 KB/sec for long periods of time. Sometimes it seems that my relay just can't take it anymore. In the log, the ratio of TAP handshakes goes wild, and I get clock jump warnings. My clock does not jump. This is Tor hanging while allocating memory.
Today I had a really bad episode, where the box started thrashing. When it became responsive again, Tor was left in a state where it was constantly downloading about 400KB/sec more than it was uploading. Normally I have a little bit more up than down, because I'm a directory server as well. I can not explain having a lot more down than up. I can fantasize that those hangs/clock jumps ("assuming established circuits no longer work") could leave 'half' circuits.
Finally I restarted my relay, (which I really don't like to do) and after a while it stabilized. At this point, my router shows a peak of almost 8000 NAT sessions.
Is this normal behavior of the network (esp. the sudden increase in connections) or is this another kind of attack/probe like what we've seen in early September? Is this because this machine is just too underpowered? Should I collect/provide any diagnostics? Have others seen similar events?
Thanks, -Job
Nov 16 22:39:52.000 [notice] Heartbeat: Tor's uptime is 36 days 6:00 hours, with 615 circuits open. I've sent 192.16 GB and received 171.74 GB. Nov 16 22:39:52.000 [notice] Average packaged cell fullness: 87.305% Nov 16 22:39:52.000 [notice] TLS write overhead: 9% Nov 16 23:39:28.000 [notice] Circuit handshake stats since last time: 6011/6013 TAP, 35/35 NTor. Nov 17 00:39:28.000 [notice] Circuit handshake stats since last time: 12521/12521 TAP, 47/47 NTor. Nov 17 01:24:45.000 [warn] Your system clock just jumped 138 seconds forward; assuming established circuits no longer work. Nov 17 01:39:28.000 [notice] Circuit handshake stats since last time: 49689/1454481 TAP, 51/51 NTor. Nov 17 02:28:44.000 [warn] Your system clock just jumped 174 seconds forward; assuming established circuits no longer work. Nov 17 02:33:12.000 [warn] Your system clock just jumped 268 seconds forward; assuming established circuits no longer work. Nov 17 02:35:17.000 [warn] Your system clock just jumped 125 seconds forward; assuming established circuits no longer work. Nov 17 02:39:28.000 [notice] Circuit handshake stats since last time: 55822/3477278 TAP, 37/37 NTor.
Nov 25 06:53:31.000 [notice] Circuit handshake stats since last time: 28194/28200 TAP, 32/32 NTor. Nov 25 07:29:57.000 [warn] Your system clock just jumped 114 seconds forward; assuming established circuits no longer work. Nov 25 07:31:58.000 [warn] Your system clock just jumped 121 seconds forward; assuming established circuits no longer work. Nov 25 07:53:31.000 [notice] Circuit handshake stats since last time: 24207/2692265 TAP, 42/42 NTor. Nov 25 08:53:31.000 [notice] Circuit handshake stats since last time: 76451/4271579 TAP, 49/49 NTor. Nov 25 09:53:31.000 [notice] Circuit handshake stats since last time: 84956/3666170 TAP, 42/42 NTor. Nov 25 10:45:15.000 [warn] Your system clock just jumped 302 seconds forward; assuming established circuits no longer work. Nov 25 11:01:59.000 [warn] Your system clock just jumped 1004 seconds forward; assuming established circuits no longer work. Nov 25 11:02:04.000 [notice] Circuit handshake stats since last time: 64816/3151447 TAP, 38/38 NTor. Nov 25 11:02:18.000 [notice] Heartbeat: Tor's uptime is 5 days 11:42 hours, with 2450 circuits open. I've sent 46.92 GB and received 46.11 GB. Nov 25 11:02:18.000 [notice] Average packaged cell fullness: 82.623% Nov 25 11:02:18.000 [notice] TLS write overhead: 8% Nov 25 11:53:31.000 [notice] Circuit handshake stats since last time: 18710/1099205 TAP, 193/202 NTor. Nov 25 13:22:03.000 [warn] Your system clock just jumped 1923 seconds forward; assuming established circuits no longer work. Nov 25 13:27:23.000 [notice] Circuit handshake stats since last time: 70795/3452514 TAP, 138/140 NTor. Nov 25 13:35:51.000 [warn] Your system clock just jumped 828 seconds forward; assuming established circuits no longer work. Nov 25 13:42:15.000 [warn] Your system clock just jumped 384 seconds forward; assuming established circuits no longer work. Nov 25 13:43:15.000 [notice] Received reload signal (hup). Reloading config and resetting internal state. Nov 25 13:43:16.000 [notice] Read configuration file "/opt/etc/tor/torrc". Nov 25 13:46:09.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Nov 25 13:46:39.000 [notice] Clean shutdown finished. Exiting.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Jobiwan Kenobi:
I've been running a relay for about months now. It runs on an 1.6 Ghz single core Atom with hyperthreading, 1GB of RAM. It's on my home connection; I advertise only 176 KB/sec.
Under normal conditions, it pumps around 100KB/sec, it has around 600 connections, it uses around 20% CPU. Everything looks very healthy.
Lately I have seen a couple of incidents where the number of connections suddenly goes up to over 3000, traffic increases heavily, CPU usage goes well over 150% (out of possible 200). Traffic can go up to between 500 and 1000 KB/sec for long periods of time. Sometimes it seems that my relay just can't take it anymore. In the log, the ratio of TAP handshakes goes wild, and I get clock jump warnings. My clock does not jump. This is Tor hanging while allocating memory.
Welcome to the world of the Raspberry Pi / BeagleBone / CubieBoard operator, except normally we'd have crashed (without some defenseive measures) before the clock jump thing - the Pi in particular has a known dodgy "clock."
Today I had a really bad episode, where the box started thrashing. When it became responsive again, Tor was left in a state where it was constantly downloading about 400KB/sec more than it was uploading. Normally I have a little bit more up than down, because I'm a directory server as well. I can not explain having a lot more down than up. I can fantasize that those hangs/clock jumps ("assuming established circuits no longer work") could leave 'half' circuits.
As best I can tell, probably that's a flood of incoming TAP requests or TLS handshakes.
Finally I restarted my relay, (which I really don't like to do) and after a while it stabilized. At this point, my router shows a peak of almost 8000 NAT sessions.
Is this normal behavior of the network (esp. the sudden increase in connections) or is this another kind of attack/probe like what we've seen in early September? Is this because this machine is just too underpowered? Should I collect/provide any diagnostics? Have others seen similar events?
Have a look at the Raspberry Pi threads and search for "circuit creation storms." I'm slowly developing a set of defensive iptables rules for low-power relays which you might want to have a look at, but as your machine is far more capable than a Pi, you'll need to adjust accordingly (and then, I hope, contribute back!)
https://github.com/gordon-morehouse/cipollini/tree/master/contrib/90_slowboa...
(Ignore the fail2ban stuff for now, I found a more efficient way to handle the problem with the help of a list reader.)
Nov 16 22:39:52.000 [notice] Heartbeat: Tor's uptime is 36 days 6:00 hours, with 615 circuits open. I've sent 192.16 GB and received 171.74 GB. Nov 16 22:39:52.000 [notice] Average packaged cell fullness: 87.305% Nov 16 22:39:52.000 [notice] TLS write overhead: 9% Nov 16 23:39:28.000 [notice] Circuit handshake stats since last time: 6011/6013 TAP, 35/35 NTor. Nov 17 00:39:28.000 [notice] Circuit handshake stats since last time: 12521/12521 TAP, 47/47 NTor. Nov 17 01:24:45.000 [warn] Your system clock just jumped 138 seconds forward; assuming established circuits no longer work. Nov 17 01:39:28.000 [notice] Circuit handshake stats since last time: 49689/1454481 TAP, 51/51 NTor. Nov 17 02:28:44.000 [warn] Your system clock just jumped 174 seconds forward; assuming established circuits no longer work. Nov 17 02:33:12.000 [warn] Your system clock just jumped 268 seconds forward; assuming established circuits no longer work. Nov 17 02:35:17.000 [warn] Your system clock just jumped 125 seconds forward; assuming established circuits no longer work. Nov 17 02:39:28.000 [notice] Circuit handshake stats since last time: 55822/3477278 TAP, 37/37 NTor.
Nov 25 06:53:31.000 [notice] Circuit handshake stats since last time: 28194/28200 TAP, 32/32 NTor. Nov 25 07:29:57.000 [warn] Your system clock just jumped 114 seconds forward; assuming established circuits no longer work. Nov 25 07:31:58.000 [warn] Your system clock just jumped 121 seconds forward; assuming established circuits no longer work. Nov 25 07:53:31.000 [notice] Circuit handshake stats since last time: 24207/2692265 TAP, 42/42 NTor. Nov 25 08:53:31.000 [notice] Circuit handshake stats since last time: 76451/4271579 TAP, 49/49 NTor. Nov 25 09:53:31.000 [notice] Circuit handshake stats since last time: 84956/3666170 TAP, 42/42 NTor. Nov 25 10:45:15.000 [warn] Your system clock just jumped 302 seconds forward; assuming established circuits no longer work. Nov 25 11:01:59.000 [warn] Your system clock just jumped 1004 seconds forward; assuming established circuits no longer work. Nov 25 11:02:04.000 [notice] Circuit handshake stats since last time: 64816/3151447 TAP, 38/38 NTor. Nov 25 11:02:18.000 [notice] Heartbeat: Tor's uptime is 5 days 11:42 hours, with 2450 circuits open. I've sent 46.92 GB and received 46.11 GB. Nov 25 11:02:18.000 [notice] Average packaged cell fullness: 82.623% Nov 25 11:02:18.000 [notice] TLS write overhead: 8% Nov 25 11:53:31.000 [notice] Circuit handshake stats since last time: 18710/1099205 TAP, 193/202 NTor. Nov 25 13:22:03.000 [warn] Your system clock just jumped 1923 seconds forward; assuming established circuits no longer work. Nov 25 13:27:23.000 [notice] Circuit handshake stats since last time: 70795/3452514 TAP, 138/140 NTor. Nov 25 13:35:51.000 [warn] Your system clock just jumped 828 seconds forward; assuming established circuits no longer work. Nov 25 13:42:15.000 [warn] Your system clock just jumped 384 seconds forward; assuming established circuits no longer work. Nov 25 13:43:15.000 [notice] Received reload signal (hup). Reloading config and resetting internal state. Nov 25 13:43:16.000 [notice] Read configuration file "/opt/etc/tor/torrc". Nov 25 13:46:09.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now. Nov 25 13:46:39.000 [notice] Clean shutdown finished. Exiting.
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
- -- Sent from my thing that sends email.
Gordon Morehouse wrote:
Jobiwan Kenobi:
I've been running a relay for about months now. It runs on an 1.6
That should have been: about 3 months.
Ghz single core Atom with hyperthreading, 1GB of RAM. It's on my home connection; I advertise only 176 KB/sec.
Under normal conditions, it pumps around 100KB/sec, it has around 600 connections, it uses around 20% CPU. Everything looks very healthy.
Lately I have seen a couple of incidents where the number of connections suddenly goes up to over 3000, traffic increases heavily, CPU usage goes well over 150% (out of possible 200). Traffic can go up to between 500 and 1000 KB/sec for long periods of time. Sometimes it seems that my relay just can't take it anymore. In the log, the ratio of TAP handshakes goes wild, and I get clock jump warnings. My clock does not jump. This is Tor hanging while allocating memory.
Welcome to the world of the Raspberry Pi / BeagleBone / CubieBoard operator, except normally we'd have crashed (without some defenseive measures) before the clock jump thing - the Pi in particular has a known dodgy "clock."
Today I had a really bad episode, where the box started thrashing. When it became responsive again, Tor was left in a state where it was constantly downloading about 400KB/sec more than it was uploading. Normally I have a little bit more up than down, because I'm a directory server as well. I can not explain having a lot more down than up. I can fantasize that those hangs/clock jumps ("assuming established circuits no longer work") could leave 'half' circuits.
As best I can tell, probably that's a flood of incoming TAP requests or TLS handshakes.
Finally I restarted my relay, (which I really don't like to do) and after a while it stabilized. At this point, my router shows a peak of almost 8000 NAT sessions.
Is this normal behavior of the network (esp. the sudden increase in connections) or is this another kind of attack/probe like what we've seen in early September? Is this because this machine is just too underpowered? Should I collect/provide any diagnostics? Have others seen similar events?
Have a look at the Raspberry Pi threads and search for "circuit creation storms." I'm slowly developing a set of defensive iptables rules for low-power relays which you might want to have a look at, but as your machine is far more capable than a Pi, you'll need to adjust accordingly (and then, I hope, contribute back!)
https://github.com/gordon-morehouse/cipollini/tree/master/contrib/90_slowboa...
(Ignore the fail2ban stuff for now, I found a more efficient way to handle the problem with the help of a list reader.)
Thanks Gordon,
I'm not sure I can get iptables up and working on this box. It is more of an appliance. (Tho I did get a build environment on it to build Tor.)
Throttling incoming connections is probably not the answer in this case, as it can still build up to a large number. Throttling handshakes might, but that can't be done on network level.
Anyway, this 'attack' (if that's what it is .. millions of TAP handshakes per hour) doesn't kill my relay, but after those clock jump messages, it is left in a state where it downloads waaay more data that it uploads. As if the circuits it assumed to be no longer working are still sending data that doesn't get relayed.
If this is the case, would it be possible to detect this and either block those circuits, close those connections, or make no assumptions in the first place?
Another one of these is going right now as I write.
When I set my bandwidth rate to 256 KB, download fills that up while upload is at only 150 KB or so. When I set it to 1 MB, download fills that up while upload stays at roughly 150 KB. CPU is well over 100%.
Normally I have the rate at 10 Mbit, a bit less than my actual bandwidth, but advertise much less. When I don't have 3000+ connections, sometimes I see it do high volume for long times with relatively low CPU load.
This time I'm not going to relaunch it but let it recover on its own. With a lowered bandwidth rate since most of it's going into a sink hole anyway.
-Job
Lately I have seen a couple of incidents where the number of connections suddenly goes up to over 3000, traffic increases heavily, CPU usage goes well over 150% (out of possible 200). Traffic can go up to between 500 and 1000 KB/sec for long periods of time. Sometimes it seems that my relay just can't take it anymore. In the log, the ratio of TAP handshakes goes wild, and I get clock jump warnings. My clock does not jump. This is Tor hanging while allocating memory.
My very cheap home router couldn't handle many connections and crashed often. I solved the issue by disabling Hidden Services and now number of connections is very low and everything is smooth. (But I have just bought a new home router so I'll allow HS soon.) The magic configuration line is:
HidServDirectoryV2 0
Best regards, Ognyan Kulev
tor-relays@lists.torproject.org