Hi,
We're experiencing what looks like a DoS attack on multiple relays in our family:
https://atlas.torproject.org/#search/family:CBEAE10CBBB86C51059246B2EF92EB2C...
The relays are currently running Tor 0.3.1.9 on Linux kernel 4.4.0 (although when the problem started the relays were running Tor 0.3.1.8).
The attack knocked 3 of 6 relays offline overnight. By the time we looked at logs, the Tor service had stopped and this was the last line in the log:
"Tor[xyz]: Failing because we have 16351 connections already. Please read doc/TUNING for guidance."
The attack is still ongoing. When it's happening, the number of connections rises very rapidly, until the attack succeeds in stopping the service.
$ ss -s Total: 15855 (kernel 0) TCP: 24520 (estab 23969, closed 305, orphaned 31, synrecv 0, timewait 261/0), ports 0
Transport Total IP IPv6 * 0 - - RAW 0 0 0 UDP 8 4 4 TCP 24215 24213 2 INET 24223 24217 6 FRAG 0 0 0
... and only a few seconds later:
$ ss -s Total: 12120 (kernel 0) TCP: 27389 (estab 20026, closed 1906, orphaned 45, synrecv 0, timewait 1587/0), ports 0
Transport Total IP IPv6 * 0 - - RAW 0 0 0 UDP 8 4 4 TCP 25483 25481 2 INET 25491 25485 6 FRAG 0 0 0
That's obviously much larger than the normal number of connections, more than we've ever seen, and seems like more connections than would be needed for a relay.
We have file descriptors (/proc/sys/fs/file-max) set to 64000, but it looks like Tor sets MAX_FILEDESCRIPTORS to 16384 per /etc/init.d/tor:
elif [ "$system_max" -gt "40000" ] ; then MAX_FILEDESCRIPTORS=16384
Surely that is high enough for normal service?
We haven't started looking into where the traffic is coming from or other characteristics. We are wondering if: 1) this is a known attack, 2) if other operators are experiencing it, 3) if there are any ideas for mitigating it, and 4) if any additional information would be helpful.
Thanks.
Hi null
Am 04-Dec-17 um 20:40 schrieb null:
$ ss -s Total: 15855 (kernel 0) TCP: 24520 (estab 23969, closed 305, orphaned 31, synrecv 0, timewait 261/0), ports 0
imho the attempts have tcp state. I experienced similar from a minor number of non relays. It seems like you gather too many statefull connects. The ips might not be evil. Heavy action can be you purge them or tcpdrop(8) before they hurt. Or connection limit by ip per firewall.
null null@omuravpn.com wrote:
We're experiencing what looks like a DoS attack on multiple relays in our family:
https://atlas.torproject.org/#search/family:CBEAE10CBBB86C51059246B2EF92EB2C...
The relays are currently running Tor 0.3.1.9 on Linux kernel 4.4.0 (although when the problem started the relays were running Tor 0.3.1.8).
The attack knocked 3 of 6 relays offline overnight. By the time we looked at logs, the Tor service had stopped and this was the last line in the log:
"Tor[xyz]: Failing because we have 16351 connections already. Please read doc/TUNING for guidance."
The attack is still ongoing. When it's happening, the number of connections rises very rapidly, until the attack succeeds in stopping the service.
$ ss -s Total: 15855 (kernel 0) TCP: 24520 (estab 23969, closed 305, orphaned 31, synrecv 0, timewait 261/0), ports 0
Transport Total IP IPv6
- 0 - -
RAW 0 0 0 UDP 8 4 4 TCP 24215 24213 2 INET 24223 24217 6 FRAG 0 0 0
... and only a few seconds later:
$ ss -s Total: 12120 (kernel 0) TCP: 27389 (estab 20026, closed 1906, orphaned 45, synrecv 0, timewait 1587/0), ports 0
Transport Total IP IPv6
- 0 - -
RAW 0 0 0 UDP 8 4 4 TCP 25483 25481 2 INET 25491 25485 6 FRAG 0 0 0
That's obviously much larger than the normal number of connections, more than we've ever seen, and seems like more connections than would be needed for a relay.
What you are seeing is most likely the same phenomenon brought up on this list repeatedly over at least the last decade or so. That phenomenon is providing HSDir service, or perhaps a rendez-vous point, for a popular hidden service. As soon as your node is associated with that hidden service and that association begins to be distributed by the HSDir population to clients looking for that hidden service, the number of connections to your node will increase fairly rapidly to a level corresponding to that hidden service's level of popularity. If you don't like it, you can set
HidServDirectoryV2 0
which will stop clients from trying to get hidden service descriptors from your node, which will eliminate most of the bursts of connections you're seeing, but will not prevent your node from being a rendez-vous point because every tor relay is expected to provide that function as part of the relay protocols.
We have file descriptors (/proc/sys/fs/file-max) set to 64000, but it looks like Tor sets MAX_FILEDESCRIPTORS to 16384 per /etc/init.d/tor:
elif [ "$system_max" -gt "40000" ] ; then MAX_FILEDESCRIPTORS=16384
Surely that is high enough for normal service?
If by normal you mean "low traffic", then yes, it's probably enough. However, that's really not very high in a general sense. Consider also that some installed packages place high demands upon the supply of file descriptors. (E.g., I gather you do not have a graphics port/package called piglit installed on your system, which recommends at least 50000 be available for its runs, so I have
kern.maxfiles="50000"
in /boot/loader.conf on my FreeBSD system. I don't think I can recall tor ever handling many more than 5000 (i.e., 10% of that figure) at one time on my low-traffic node.) The faster, larger-capacity tor nodes often have considerably higher settings to keep tor from exhausting the fd limits on those hosts.
We haven't started looking into where the traffic is coming from or other characteristics. We are wondering if: 1) this is a known attack, 2) if other operators are experiencing it, 3) if there are any ideas for mitigating it, and 4) if any additional information would be helpful.
Other than refusing to be a hidden service directory server, there is probably nothing to be done about it. Adjust your settings accordingly, along with your expectations. :-)
Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
tl;dr: run this:
conntrack -L -p tcp --dport 9001 | awk '{print $5}' | sort | uniq -c | sort -n
ignore numbers less than 10. the remaining output should consist of the following:
1. your IP 2. LeaseWeb and Online.net IPs (use rDNS and whois) 3. mobile networks
block IPs in set 2 from accessing Tor using your firewall software. don't block 1 or you will have problems. don't block 3 or other people may have problems (hopefully more if the Android project gains momentum). also don't block the /16 of guards mentioned. they are causing no measurable harm. my list of set 2 is available upon request to longstanding members (basically if you've done anything on Trac or any tor mailing lists).
more information:
This attack appears to be malicious to me. It seems to work like this:
1. Open many OR connections (hundreds to thousands) 2. Leave open until tor runs out of sockets
Tor presently waits for the connections to time out, which takes 3-4.5 minutes. It should instead more aggressively prune these garbage connections. https://trac.torproject.org/projects/tor/ticket/19984 tracks this.
In the interim, this attack is causing real problems, so this approach seems reasonable for now. If you want to be slightly more conservative, instead of blocking them outright, simply limit these IP subnets to a small number of connections. Modern Tor only requires one, but you could do two or three per address, or four or five per /28 or so. Since these are not NATed IPs, a high limit is not justified. I recommend against the blanket approach suggested previously of limiting whole sets of /24s, since that may inadvertently block mobile clients and is not effective against the current attack. As mentioned in the previous paragraph, you should not set DisableOOSCheck 0, as that may wind up killing good sockets instead.
again, this is not a good solution, but until that ticket is resolved, it is probably the best that can be done.
Evidence for this attack being malicious and intending to disable Tor is:
1. these connections are coming very fast: as shown by others too, dozens of connections per second per IP, and tens of thousands of connections held open. the standard tor software multiplexes circuits over a single TCP connection, so even very high-volume links should only have a single Tor connection, or possibly two if they are very old and must make a separate connection to the DirPort.
2. the connections do not taper off if they are rejected. I banned these addresses from accessing Tor, and they continue to make dozens of connection attempts every second from each IP address. this means that this is probably not a good faith "test" or a misconfigured set of real Tor clients, but is instead malicious and using a modified or custom client.
3. they are spread out over many IPs, *but* only from *two* *hosting* service providers. this means that: 3a. it is likely a single individual or organization, otherwise it would be multiple providers 3b. they are trying to cause as much problems as possible, otherwise they would use one server 3c. it is almost certainly not real clients using NAT; as far as I know, LeaseWeb does not use NAT, and Online.net only uses one-to-one NAT.
4. rDNS is generic. this means they do not care enough to explain their origin.
5. as pointed out, they have not registered themselves in the relay consensus. this means it is probably not a set of high-volume relays that is somehow malfunctioning, or someone conducting a DoS attack through Tor itself.
6. as far as I can tell, these connections do not do anything. they simply remain open, consuming resources until tor times them out.
7. they keep far more connections open than they make. what I mean by this is that they hold open thousands of connections at once but only send less than a hundred connections per second. this supports my theory that it is not a large number of regular tor clients, but is instead some custom client specifically for disruption.
additionally:
The referenced /16 block of guards is *not* part of this attack, and is simply poorly configured relays. you should not block that set, but instead block the set described above.
On Mon, December 11, 2017 1:40 pm, Alex Xu wrote:
tl;dr: run this:
conntrack -L -p tcp --dport 9001 | awk '{print $5}' | sort | uniq -c |
sort -n
Thanks for the detailed analysis.
ignore numbers less than 10. the remaining output should consist of the following:
...
are not NATed IPs, a high limit is not justified. I recommend against the blanket approach suggested previously of limiting whole sets of /24s, since that may inadvertently block mobile clients and is not effective against the current attack. As mentioned in the previous
I agree the approach of /24 connlimit is not a good approach to Exit nodes. But for relays only worked fine for me and others.
cheers.
-- x9p | PGP : 0x03B50AF5EA4C8D80 / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Hi Alex
Great points.
conntrack -L -p tcp --dport 9001 | awk '{print $5}' | sort | uniq -c | sort -n
On FreeBSD one can do:
In packetfilter:
# play with the numbers but more than 64k per ip if possible set limit { frags 70000, src-nodes 70000, states 70000, table-entries 100000 }
table <blockOR> persist
# 2000 is super high. Rate limit 50 new connects per 5 secs # overload but not flush pass in on $if_ext inet proto tcp from any to $relay_ip port $or_port flags S/SA modulate state (max-src-conn 2000,max-src-conn-rate 50/5,overload <blockOR>)
As cronjob:
# release block after 10 minutes pfctl -t blockOR -T expire 600
These measures protect your system. IMO for other or future cases we should keep the clients degree of freedom (researchers / fancy doers) as high as possible, being not too restrictive.
- Open many OR connections (hundreds to thousands)
- Leave open until tor runs out of sockets
If the ip is saturated for like 2 hours the relay might loose the hsdir flag. But today there are not enough resources in the game to generate an issue for the network.
I recommend against the blanket approach suggested previously of limiting whole sets of /24s, since that may inadvertently block mobile clients and is not effective against the current attack.
Right. In future one could put such loud clients besides useful ips a let the relays block the usefull.
- the connections do not taper off if they are rejected. I banned these
addresses from accessing Tor, and they continue to make dozens of connection attempts every second from each IP address. this means that this is probably not a good faith "test" or a misconfigured set of real Tor clients, but is instead malicious and using a modified or custom client.
The above rule limits the useless attempts to a certain limit and recovers after 10 minutes. This protects but gives the 'offender' the chance to tune his client to a better behaviour (in case he wants it).
3c. it is almost certainly not real clients using NAT; as far as I know, LeaseWeb does not use NAT, and Online.net only uses one-to-one NAT.
Good point. A general blocking rule should be smart enough to enable NAT clients anyway ?
I am getting these warnings, not very often, and the exit (restricted) is working well otherwise:
"Dec 11 18:07:23.000 [warn] Tried to establish rendezvous on non-OR circuit with purpose Acting as rendevous (pending)"
Some posts about this elsewhere hinted this warning could be caused by attacks. I am not seeing attacks otherwise.
Gerry
-----Original Message----- From: tor-relays [mailto:tor-relays-bounces@lists.torproject.org] On Behalf Of Felix Sent: 11 December 2017 17:08 To: tor-relays@lists.torproject.org Subject: Re: [tor-relays] DoS attacks are real (probably)
Hi Alex
Great points.
conntrack -L -p tcp --dport 9001 | awk '{print $5}' | sort | uniq
-c | sort -n
On FreeBSD one can do:
In packetfilter:
# play with the numbers but more than 64k per ip if possible set limit { frags 70000, src-nodes 70000, states 70000, table-entries 100000 }
table <blockOR> persist
# 2000 is super high. Rate limit 50 new connects per 5 secs # overload but not flush pass in on $if_ext inet proto tcp from any to $relay_ip port $or_port flags S/SA modulate state (max-src-conn 2000,max-src-conn-rate 50/5,overload <blockOR>)
As cronjob:
# release block after 10 minutes pfctl -t blockOR -T expire 600
These measures protect your system. IMO for other or future cases we should keep the clients degree of freedom (researchers / fancy doers) as high as possible, being not too restrictive.
- Open many OR connections (hundreds to thousands) 2. Leave open
until tor runs out of sockets
If the ip is saturated for like 2 hours the relay might loose the hsdir flag. But today there are not enough resources in the game to generate an issue for the network.
I recommend against the blanket approach suggested previously of limiting whole sets of /24s, since that may inadvertently block mobile clients and is not effective against the current attack.
Right. In future one could put such loud clients besides useful ips a let the relays block the usefull.
- the connections do not taper off if they are rejected. I banned
these addresses from accessing Tor, and they continue to make dozens of connection attempts every second from each IP address. this means that this is probably not a good faith "test" or a misconfigured set of real Tor clients, but is instead malicious and using a modified or custom client.
The above rule limits the useless attempts to a certain limit and recovers after 10 minutes. This protects but gives the 'offender' the chance to tune his client to a better behaviour (in case he wants it).
3c. it is almost certainly not real clients using NAT; as far as I know, LeaseWeb does not use NAT, and Online.net only uses one-to-one NAT.
Good point. A general blocking rule should be smart enough to enable NAT clients anyway ?
-- Cheers, Felix _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Quoting Felix (2017-12-11 17:07:30), as excerpted
Hi Alex
Great points.
conntrack -L -p tcp --dport 9001 | awk '{print $5}' | sort | uniq -c | sort -n
On FreeBSD one can do:
yeah, the optimal rule would ban "bad IPs" after some threshold of connections, like "if one IP makes >1 conn/sec for at least 1 minute ban for 1 hour" or something. I'm hoping to fix the underlying issue in Tor so that low-bandwidth attacks like these are less effective.
Alex Xu alex_y_xu@yahoo.ca wrote:
Quoting Felix (2017-12-11 17:07:30), as excerpted
Hi Alex
Great points.
conntrack -L -p tcp --dport 9001 | awk '{print $5}' | sort | uniq -c | sort -n
On FreeBSD one can do:
yeah, the optimal rule would ban "bad IPs" after some threshold of connections, like "if one IP makes >1 conn/sec for at least 1 minute ban for 1 hour" or something. I'm hoping to fix the underlying issue in Tor so that low-bandwidth attacks like these are less effective.
FWIW, the method that Felix posted should also work in DragonflyBSD and NetBSD. It may also work in OpenBSD, but the caveat is that the OpenBSD project has continued to develop its implementation of pf, so I don't know whether Felix's solution still works in OpenBSD. The other three BSDs' pf support has not been synchronized with that of the originating project (OpenBSD) for many years. Perhaps an OpenBSD tor relay operator can comment here on this matter.
Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
On Mon, 11 Dec 2017 at 18:07 Felix zwiebel@quantentunnel.de wrote:
Hi Alex
Great points.
conntrack -L -p tcp --dport 9001 | awk '{print $5}' | sort | uniq -c
| sort -n
On FreeBSD one can do:
In packetfilter:
# play with the numbers but more than 64k per ip if possible set limit { frags 70000, src-nodes 70000, states 70000, table-entries 100000 }
table <blockOR> persist
# 2000 is super high. Rate limit 50 new connects per 5 secs # overload but not flush pass in on $if_ext inet proto tcp from any to $relay_ip port $or_port flags S/SA modulate state (max-src-conn 2000,max-src-conn-rate 50/5,overload <blockOR>)
As cronjob:
# release block after 10 minutes pfctl -t blockOR -T expire 600
These measures protect your system. IMO for other or future cases we should keep the clients degree of freedom (researchers / fancy doers) as high as possible, being not too restrictive.
- Open many OR connections (hundreds to thousands)
- Leave open until tor runs out of sockets
If the ip is saturated for like 2 hours the relay might loose the hsdir flag. But today there are not enough resources in the game to generate an issue for the network.
I recommend against the blanket approach suggested previously of limiting whole sets of /24s, since that may inadvertently block mobile clients and is not effective against the current attack.
Right. In future one could put such loud clients besides useful ips a let the relays block the usefull.
- the connections do not taper off if they are rejected. I banned these
addresses from accessing Tor, and they continue to make dozens of connection attempts every second from each IP address. this means that this is probably not a good faith "test" or a misconfigured set of real Tor clients, but is instead malicious and using a modified or custom client.
The above rule limits the useless attempts to a certain limit and recovers after 10 minutes. This protects but gives the 'offender' the chance to tune his client to a better behaviour (in case he wants it).
3c. it is almost certainly not real clients using NAT; as far as I know, LeaseWeb does not use NAT, and Online.net only uses one-to-one NAT.
Good point. A general blocking rule should be smart enough to enable NAT clients anyway ?
-- Cheers, Felix _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Hi
For the ones interested in Linux version, this translates to:
-A INPUT -p tcp -m multiport --dports $or_port,$dir_port -m connlimit --connlimit-upto 2000 --connlimit-mask 24 -m hashlimit --hashlimit-upto 10/second --hashlimit-mode srcip --hashlimit-srcmask 16 --hashlimit-name mask24 -j ACCEPT -A INPUT -p tcp -m multiport --dports $or_port,$dir_port -j REJECT
Hi Alex,
This attack appears to be malicious to me. It seems to work like this:
- Open many OR connections (hundreds to thousands)
- Leave open until tor runs out of sockets
Tor presently waits for the connections to time out, which takes 3-4.5 minutes. It should instead more aggressively prune these garbage connections. https://trac.torproject.org/projects/tor/ticket/19984 tracks this.
This is exactly what we saw as well. After implementing connection limits (thanks again x9p) the problem mostly went away and our relays have been stable since.
Thank you for opening the trac ticket. We agree it would be great if this problem could be addressed in the Tor software if possible. In the mean time we should probably be advocating for all relay operators to implement connection limits. Put simply, without those limits, relays are vulnerable to DoS.
Evidence for this attack being malicious and intending to disable Tor is:
Agree with all 7 points you listed. We'd also add, there is additional evidence that suggests some of the worst offenders (attacking IPs) are actually orchestrated by a single entity (or perhaps multiple entities working together). There are several commonalities across the infrastructure used for these attacks. We identified and blocked (with iptables DENY) the worst. To be clear, these IPs were not in the consensus, and yes, mostly hosted by LeaseWeb.
The referenced /16 block of guards is *not* part of this attack, and is simply poorly configured relays. you should not block that set,
Completely agree. We haven't blocked anything in the consensus.
Thanks.
tor-relays@lists.torproject.org