[tor-relays] Tor relay occasionally maxing out CPU usage

William Kane ttallink at googlemail.com
Wed May 20 14:04:55 UTC 2020


It sure is a problem for those on virtualized machines with only a single core.

As far as offloading to a different worker thread goes, it should be
very easy to implement code wise, Tor already does off-load some
crypto stuff to a different thread when NumCPU's is set / detected
appropriately.

2020-05-20 12:24 GMT, Matt Traudt <pastly at torproject.org>:
> To me it sounds like there isn't actually a problem. This is the way Tor
> works now (now == since consensus diffs were added). It's unfortunate
> that Tor isn't more multithreaded, so much happens in the same main
> loop, and client throughput is momentarily impacted, but that's the way
> it is and there isn't a problem here to be solved. At least not for you
> the relay operator.
>
> Getting more into tor-dev@ territory here, but doesn't compressing
> consensus documents sound like something that could easily be shoved
> over into a worker thread? I'm unfamiliar with the subsystem and I'm
> sure many of my implicit assumptions are wrong.
>
> Matt
>
> On 5/19/20 11:59, William Kane wrote:
>> Okay, so your suspicion was just confirmed:
>>
>> consdiffmgr_rescan_flavor_(): The most recent ns consensus is
>> valid-after 2020-05-19T15:00:00. We have diffs to this consensus for
>> 0/25 older ns consensuses. Generating diffs for the other 25.
>>
>> Right after, diffs were compressed with zstd and lzma, causing the CPU
>> usage to spike.
>>
>> Disabling DirCache still gives me the following warning on Tor 0.4.3.5:
>>
>> May 19 17:56:42.909 [warn] DirCache is disabled and we are configured
>> as a relay. We will not become a Guard.
>>
>> So, unless I sacrifice the Guard flag, there doesn't seem to be a way
>> to fix this problem in an easy way.
>>
>> Please correct me if I'm wrong.
>>
>>
>> 2020-05-19 15:07 GMT, William Kane <ttallink at googlemail.com>:
>>> Another thing, from the change-log:
>>>
>>> - Update the message logged on relays when DirCache is disabled.
>>>   Since 0.3.3.5-rc, authorities require DirCache (V2Dir) for the
>>>   Guard flag. Fixes bug 24312; bugfix on 0.3.3.5-rc.
>>>
>>> If I understand this correctly, my relay would no longer be a Guard if
>>> I choose to disable DirCache in order to prevent Tor from hogging my
>>> CPU?
>>>
>>> From the code that I have seen, simply not setting the directory port
>>> does not stop the relay from caching / compressing diffs.
>>>
>>> Or has this been changed more recently?
>>>
>>> Not being a guard would honestly suck, and being a guard but with
>>> limited bandwidth due to Tor hogging the CPU also sucks.
>>>
>>> Any ideas on what to do?
>>>
>>> 2020-05-19 13:43 GMT, William Kane <ttallink at googlemail.com>:
>>>> Dear Alexander,
>>>>
>>>> I have added 'Log [dirserv]info notice stdout' to my configuration and
>>>> will be monitoring the system closely.
>>>>
>>>> Tor was also upgraded to version 0.4.3.5, and the linux kernel was
>>>> upgraded to version 5.6.13 but I do not think this will change
>>>> anything.
>>>>
>>>> Expect a follow-up within the next 12 hours.
>>>>
>>>> William
>>>>
>>>> 2020-05-18 1:40 GMT, Alexander Færøy <ahf at torproject.org>:
>>>>> Hello,
>>>>>
>>>>> On 2020/05/17 18:20, William Kane wrote:
>>>>>> Occasionally, the CPU usage hit's 100%, and the maximum throughput
>>>>>> drops down to around 16 Mbps from it's usual 80 Mbps. This happens
>>>>>> randomly and not a fixed intervals which makes it pretty hard to
>>>>>> profile.
>>>>>
>>>>> One of the subsystem's that I can think of that could potentially lead
>>>>> to the problem that you are describing is our "consensus diff"
>>>>> subsystem. The consensus diff subsystem is responsible for turning
>>>>> consensus documents into these patch(1)-like diffs that clients can
>>>>> fetch without the need to transfer the whole consensus for each minor
>>>>> change.
>>>>>
>>>>> The subsystem also takes care of compression, which includes LZMA,
>>>>> which
>>>>> is a beast when it comes to burning CPU cycles.
>>>>>
>>>>>> No abnormal entries in the log files.
>>>>>
>>>>> I suspect you're logging at `notice` log-level, which is the reasonable
>>>>> thing to do. We need to log at slightly higher granularity to discover
>>>>> the problem here.
>>>>>
>>>>> Could I get you to add `Log [dirserv]info notice syslog` to your
>>>>> `torrc`? This line makes Tor log everything at notice log-level (the
>>>>> default), to the system logger, except for the directory server
>>>>> subsystem, which will be logged at `info` log-level instead. The code
>>>>> responsible for generating consensus diffs uses the `dirserv` for
>>>>> logging purposes.
>>>>>
>>>>> If the CPU spike happens right after a log message that says something
>>>>> in the line of "The most recent XXX consensus is valid-after XXX. We
>>>>> have diffs to this consensus for XXX/XXX older XXX consensuses.
>>>>> Generating diffs for the other XXX." then I think we have our winner.
>>>>>
>>>>> Please remember to remove the `info` log-level when the experiment is
>>>>> over :-)
>>>>>
>>>>> I'm curious what you figure out here. Let me know if you need any help.
>>>>>
>>>>> All the best,
>>>>> Alex.
>>>>>
>>>>> --
>>>>> Alexander Færøy
>>>>> _______________________________________________
>>>>> tor-relays mailing list
>>>>> tor-relays at lists.torproject.org
>>>>> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
>>>>>
>>>>
>>>
>> _______________________________________________
>> tor-relays mailing list
>> tor-relays at lists.torproject.org
>> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
>>
> _______________________________________________
> tor-relays mailing list
> tor-relays at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
>


More information about the tor-relays mailing list