Hi,

We're experiencing what looks like a DoS attack on multiple relays in our family:

https://atlas.torproject.org/#search/family:CBEAE10CBBB86C51059246B2EF92EB2CB4E111BC

The relays are currently running Tor 0.3.1.9 on Linux kernel 4.4.0 (although when the problem started the relays were running Tor 0.3.1.8).

The attack knocked 3 of 6 relays offline overnight. By the time we looked at logs, the Tor service had stopped and this was the last line in the log:

"Tor[xyz]: Failing because we have 16351 connections already. Please read doc/TUNING for guidance."

The attack is still ongoing. When it's happening, the number of connections rises very rapidly, until the attack succeeds in stopping the service.

$ ss -s
Total: 15855 (kernel 0)
TCP:   24520 (estab 23969, closed 305, orphaned 31, synrecv 0, timewait 261/0), ports 0

Transport Total     IP        IPv6
*   0         -         -        
RAW   0         0         0        
UDP   8         4         4        
TCP   24215     24213     2        
INET   24223     24217     6        
FRAG   0         0         0        

... and only a few seconds later:

$ ss -s
Total: 12120 (kernel 0)
TCP:   27389 (estab 20026, closed 1906, orphaned 45, synrecv 0, timewait 1587/0), ports 0

Transport Total     IP        IPv6
*   0         -         -        
RAW   0         0         0        
UDP   8         4         4        
TCP   25483     25481     2        
INET   25491     25485     6        
FRAG   0         0         0        

That's obviously much larger than the normal number of connections, more than we've ever seen, and seems like more connections than would be needed for a relay.

We have file descriptors (/proc/sys/fs/file-max) set to 64000, but it looks like Tor sets MAX_FILEDESCRIPTORS to 16384 per /etc/init.d/tor:

  elif [ "$system_max" -gt "40000" ] ; then
    MAX_FILEDESCRIPTORS=16384

Surely that is high enough for normal service?

We haven't started looking into where the traffic is coming from or other characteristics. We are wondering if: 1) this is a known attack, 2) if other operators are experiencing it, 3) if there are any ideas for mitigating it, and 4) if any additional information would be helpful.

Thanks.