Hi all,
So in general 0.3.3.1-alpha-dev and 0.3.3.2-alpha running on two nodes without any connection limits on the iptables firewall seems to be a lot more robust against the recent increase in clients (or possible [D]DoS). But tonight for a short period of time one of the relays was running a bit "hot" so to say.
Only to be greated by this log entry: Feb 12 18:54:55 tornode2 Tor[6362]: We're low on memory (cell queues total alloc: 1602579792 buffer total alloc: 1388544, tor compress total alloc: 1586784 rendezvous cache total alloc: 489909). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.) Feb 12 18:54:56 tornode2 Tor[6362]: Removed 1599323088 bytes by killing 1 circuits; 39546 circuits remain alive. Also killed 0 non-linked directory connections. Feb 12 19:04:10 tornode2 Tor[6362]: Your network connection speed appears to have changed. Resetting timeout to 60s after 18 timeouts and 1000 buildtimes.
So 1 Circuit being able to claim 1,5 gig or ram, now this seems a big much. Whilst the DoS protection seems to do something (see below). Now this could be a new attack or just an error etc. However wouldn't some sort of fair memory balance between circuits be an other mitigation factor to consider? Not saying it should be as strict as "circuit memory"/"# of circuits" but 99.x% of memory for one circuit feels wrong for a relay.
Feb 12 13:58:34 tornode2 Tor[6362]: DoS mitigation since startup: 910770 circuits rejected, 10 marked addresses. 25972 connections closed. 324 single hop clients refused. Feb 12 19:58:34 tornode2 Tor[6362]: DoS mitigation since startup: 1222320 circuits rejected, 12 marked addresses. 33359 connections closed. 402 single hop clients refused.
Thx, Stijn
I see this occasionally. It's not specific to 0.3.3.x. I reported it back in October 2017:
https://lists.torproject.org/pipermail/tor-relays/2017-October/013328.html
Roger replied here:
https://lists.torproject.org/pipermail/tor-relays/2017-October/013334.html
MaxMemInQueues is set to 1.5 GB by default, which is why the problematic circuit uses that much RAM before its killed. You can lower MaxMemInQueues in torrc, however that will obviously have other impacts on your relay. If you have plenty of RAM, I'd maybe just leave things alone for now since Tor is already killing the circuit.
I agree in theory some mitigation against this would be nice, but I'm not smart enough to offer anything specific. It seems Roger and other devs are already thinking about the issue.
Hi Tor & Others,
On 12 Feb 2018, at 20:29, tor wrote:
I see this occasionally. It's not specific to 0.3.3.x. I reported it back in October 2017:
Thx, I more or less added the version in the subject to clearly indicate it was on an alpha release
https://lists.torproject.org/pipermail/tor-relays/2017-October/013328.html
Roger replied here:
https://lists.torproject.org/pipermail/tor-relays/2017-October/013334.html
Ah thanks, not sure why my google kung-fu missed this one.
MaxMemInQueues is set to 1.5 GB by default, which is why the problematic circuit uses that much RAM before its killed. You can lower MaxMemInQueues in torrc, however that will obviously have other impacts on your relay. If you have plenty of RAM, I'd maybe just leave things alone for now since Tor is already killing the circuit.
My tornodes have 4Gig or ram, so I also put the MaxMemInQueues at 1,5G whilst the (D)DoS attacks were more troublesome (wasn't aware it was the default).
I agree in theory some mitigation against this would be nice, but I'm not smart enough to offer anything specific. It seems Roger and other devs are already thinking about the issue.
Not a coder myself (except some scripting)
For those looking for the paper as well, the original URL gives a 403, I believe this is a copy (alterations or omitted slides can't check of course) http://www.robgjansen.com/talks/sniper-dcaps-20131011.pdf
Thx, Stijn
On 12 Feb (20:09:35), Stijn Jonker wrote:
Hi all,
So in general 0.3.3.1-alpha-dev and 0.3.3.2-alpha running on two nodes without any connection limits on the iptables firewall seems to be a lot more robust against the recent increase in clients (or possible [D]DoS). But tonight for a short period of time one of the relays was running a bit "hot" so to say.
Only to be greated by this log entry: Feb 12 18:54:55 tornode2 Tor[6362]: We're low on memory (cell queues total alloc: 1602579792 buffer total alloc: 1388544, tor compress total alloc: 1586784 rendezvous cache total alloc: 489909). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.) Feb 12 18:54:56 tornode2 Tor[6362]: Removed 1599323088 bytes by killing 1 circuits; 39546 circuits remain alive. Also killed 0 non-linked directory connections.
Wow... 1599323088 bytes is insane. This should _not_ happen for only 1 circuit. We actually have checks in place to avoid this but it seems they either totally failed or we have a edge case.
Can you tell me what scheduler were you using (look for "Scheduler" in the notice log).
Any warnings in the logs that you could share or everything was normal?
Finally, if you can share the OS you are running this relay and if Linux, the kernel version.
Big thanks! David
Hi David,
On 12 Feb 2018, at 20:44, David Goulet wrote:
On 12 Feb (20:09:35), Stijn Jonker wrote:
Hi all,
So in general 0.3.3.1-alpha-dev and 0.3.3.2-alpha running on two nodes without any connection limits on the iptables firewall seems to be a lot more robust against the recent increase in clients (or possible [D]DoS). But tonight for a short period of time one of the relays was running a bit "hot" so to say.
Only to be greated by this log entry: Feb 12 18:54:55 tornode2 Tor[6362]: We're low on memory (cell queues total alloc: 1602579792 buffer total alloc: 1388544, tor compress total alloc: 1586784 rendezvous cache total alloc: 489909). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.) Feb 12 18:54:56 tornode2 Tor[6362]: Removed 1599323088 bytes by killing 1 circuits; 39546 circuits remain alive. Also killed 0 non-linked directory connections.
Wow... 1599323088 bytes is insane. This should _not_ happen for only 1 circuit. We actually have checks in place to avoid this but it seems they either totally failed or we have a edge case.
Yeah it felt a "bit" much. A couple megs I wouldn't have shared :-)
Can you tell me what scheduler were you using (look for "Scheduler" in the notice log).
The schedular always seems to be KIST (never played with it/tried to change it) Feb 11 19:58:24 tornode2 Tor[6362]: Scheduler type KIST has been enabled.
Any warnings in the logs that you could share or everything was normal?
Besides that ESXi host gave an alarm about CPU usage, nothing odd in the logs around that time I could find. The general syslog logging worked both locally on the host and remote as the hourly cron jobs surround this entry.
Finally, if you can share the OS you are running this relay and if Linux, the kernel version.
Debian Stretch, Linux tornode2 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux not sure it matters, but ESXi based VM, running with 2 vCPU's based on i5-5300U, 4 Gig of memory
No problems, happy to squash bugs. I guess one of the "musts" when running Alpha code, although this might not be alpha related (I can't judge).
Thx, Stijn
On 12 Feb (21:14:14), Stijn Jonker wrote:
Hi David,
On 12 Feb 2018, at 20:44, David Goulet wrote:
On 12 Feb (20:09:35), Stijn Jonker wrote:
Hi all,
So in general 0.3.3.1-alpha-dev and 0.3.3.2-alpha running on two nodes without any connection limits on the iptables firewall seems to be a lot more robust against the recent increase in clients (or possible [D]DoS). But tonight for a short period of time one of the relays was running a bit "hot" so to say.
Only to be greated by this log entry: Feb 12 18:54:55 tornode2 Tor[6362]: We're low on memory (cell queues total alloc: 1602579792 buffer total alloc: 1388544, tor compress total alloc: 1586784 rendezvous cache total alloc: 489909). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.) Feb 12 18:54:56 tornode2 Tor[6362]: Removed 1599323088 bytes by killing 1 circuits; 39546 circuits remain alive. Also killed 0 non-linked directory connections.
Wow... 1599323088 bytes is insane. This should _not_ happen for only 1 circuit. We actually have checks in place to avoid this but it seems they either totally failed or we have a edge case.
Yeah it felt a "bit" much. A couple megs I wouldn't have shared :-)
Can you tell me what scheduler were you using (look for "Scheduler" in the notice log).
The schedular always seems to be KIST (never played with it/tried to change it) Feb 11 19:58:24 tornode2 Tor[6362]: Scheduler type KIST has been enabled.
Any warnings in the logs that you could share or everything was normal?
Besides that ESXi host gave an alarm about CPU usage, nothing odd in the logs around that time I could find. The general syslog logging worked both locally on the host and remote as the hourly cron jobs surround this entry.
Finally, if you can share the OS you are running this relay and if Linux, the kernel version.
Debian Stretch, Linux tornode2 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux not sure it matters, but ESXi based VM, running with 2 vCPU's based on i5-5300U, 4 Gig of memory
No problems, happy to squash bugs. I guess one of the "musts" when running Alpha code, although this might not be alpha related (I can't judge).
Thanks for all the information!
I've opened https://bugs.torproject.org/25226
Cheers! David
tor-relays@lists.torproject.org