[tor-bugs] #24857 [Core Tor/Tor]: tor 0.3.1.9 100% cpu load

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Jan 24 09:39:26 UTC 2018


#24857: tor 0.3.1.9 100% cpu load
--------------------------+-----------------------------------
 Reporter:  Eugene646     |          Owner:  (none)
     Type:  defect        |         Status:  needs_information
 Priority:  Medium        |      Milestone:
Component:  Core Tor/Tor  |        Version:  Tor: 0.3.1.9
 Severity:  Normal        |     Resolution:
 Keywords:  cpu, windows  |  Actual Points:
Parent ID:                |         Points:
 Reviewer:                |        Sponsor:
--------------------------+-----------------------------------

Comment (by creideiki):

 I think I'm seeing this same bug, or at least something very similar.

 I run Tor as a non-exit relay in an LXC container on a Gentoo Linux box on
 an Intel Core i7-6700K, with Tor constantly shuffling 10Mb/s of traffic.
 Some random time after starting the Tor service, it starts eating one
 entire CPU core.

 I'm not sure if it's related, since I don't have CPU usage history for
 this container and thus don't know when the 100% CPU usage started, but
 the Tor log contains a lot of these messages:

    Jan 24 05:17:24.000 [warn] Failing because we have 4063 connections
 already. Please read doc/TUNING for guidance. [over 16000001 similar
 message(s) suppressed in last 21600 seconds]

 If that suppression count is correct, that's 740 messages per second,
 which seems a bit excessive.

 I'm currently running this package:

    net-vpn/tor-0.3.2.9::gentoo was built with the following:
    USE="-libressl -lzma -scrypt seccomp (-selinux) -systemd -test tor-
 hardening -web -zstd" ABI_X86="(64)"

 The logs suggest the following timeline:

 Dec 9: First occurrence of the "Failing because we have X connections
 already" message, X=29967, on a Tor 0.3.2.2-alpha with 55 days of uptime.
 Dec 10: Next message, X=29967.
 Dec 15: Tor upgraded to 0.3.2.6-alpha.
 Dec 22: Next message, X=29967, at 6 days 6 hours uptime.
 Dec 30: Next message, X=29967, at 15 days 6 hours uptime.
 Jan 1: 2 messages, X=29967, with 6 hours interval.
 Jan 2: 1 message, X=29967, 21 hours after the last one.
 Jan 7: Machine is rebooted to kernel 4.14.11 to get initial Meltdown
 patches. Failure messages start coming 4 minutes after starting the
 service, with X=4063.
 Jan 7-21: Failure messages come regularly every 6 hours, X=4063.
 Jan 21: Tor is upgraded to 0.3.2.9. Failure messages start coming 7
 minutes after starting the service, with X=4063.
 Jan 21-24 (now): Failure messages come regularly every 6 hours, X=4063.

 The Tor process uses 100% CPU (i.e. one core), but not very much memory -
 currently 900M virtual, 300M resident.

 "perf top" on the Tor process isn't very helpful; at the top is perf's own
 BPF stuff at 5-10% CPU time, followed by pthread functions in glibc at
 ~2-3%. Actual Tor code is way down the list; the only function visible
 without going ridiculously low is "assert_connection_ok" at 1% CPU time.

 "strace" on the Tor process sees mostly calls to epoll_pwait():

    [pid 23279] 1516784382.485502 epoll_pwait(3, [{EPOLLIN, {u32=9,
 u64=9}}], 512, 9, NULL, 8) = 1

 Actually, a whole lot of them; strace counts around 69000 such calls per
 second, with other syscalls during one random second coming in at much
 lower counts:

       1 accept4
       1 close
       1 setsockopt
     174 getpid
     311 futex
     682 write
     967 getsockopt
     967 ioctl
    1116 epoll_ctl
    1561 read

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24857#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list