Re: [tor-relays] Tor relay occasionally maxing out CPU usage

20 May 2020

      To me it sounds like there isn't actually a problem. This is the way Tor
works now (now == since consensus diffs were added). It's unfortunate
that Tor isn't more multithreaded, so much happens in the same main
loop, and client throughput is momentarily impacted, but that's the way
it is and there isn't a problem here to be solved. At least not for you
the relay operator.

Getting more into tor-dev@ territory here, but doesn't compressing
consensus documents sound like something that could easily be shoved
over into a worker thread? I'm unfamiliar with the subsystem and I'm
sure many of my implicit assumptions are wrong.

Matt

On 5/19/20 11:59, William Kane wrote:
...
Okay, so your suspicion was just confirmed:
consdiffmgr_rescan_flavor_(): The most recent ns consensus is
valid-after 2020-05-19T15:00:00. We have diffs to this consensus for
0/25 older ns consensuses. Generating diffs for the other 25.
Right after, diffs were compressed with zstd and lzma, causing the CPU
usage to spike.
Disabling DirCache still gives me the following warning on Tor 0.4.3.5:
May 19 17:56:42.909 [warn] DirCache is disabled and we are configured
as a relay. We will not become a Guard.
So, unless I sacrifice the Guard flag, there doesn't seem to be a way
to fix this problem in an easy way.
Please correct me if I'm wrong.
2020-05-19 15:07 GMT, William Kane <ttallink@googlemail.com>:
...
Another thing, from the change-log:
- Update the message logged on relays when DirCache is disabled.
  Since 0.3.3.5-rc, authorities require DirCache (V2Dir) for the
  Guard flag. Fixes bug 24312; bugfix on 0.3.3.5-rc.
If I understand this correctly, my relay would no longer be a Guard if
I choose to disable DirCache in order to prevent Tor from hogging my
CPU?
From the code that I have seen, simply not setting the directory port
does not stop the relay from caching / compressing diffs.
Or has this been changed more recently?
Not being a guard would honestly suck, and being a guard but with
limited bandwidth due to Tor hogging the CPU also sucks.
Any ideas on what to do?
2020-05-19 13:43 GMT, William Kane <ttallink@googlemail.com>:
...
Dear Alexander,
I have added 'Log [dirserv]info notice stdout' to my configuration and
will be monitoring the system closely.
Tor was also upgraded to version 0.4.3.5, and the linux kernel was
upgraded to version 5.6.13 but I do not think this will change
anything.
Expect a follow-up within the next 12 hours.
William
2020-05-18 1:40 GMT, Alexander Færøy <ahf@torproject.org>:
...
Hello,
On 2020/05/17 18:20, William Kane wrote:
...
Occasionally, the CPU usage hit's 100%, and the maximum throughput
drops down to around 16 Mbps from it's usual 80 Mbps. This happens
randomly and not a fixed intervals which makes it pretty hard to
profile.
One of the subsystem's that I can think of that could potentially lead
to the problem that you are describing is our "consensus diff"
subsystem. The consensus diff subsystem is responsible for turning
consensus documents into these patch(1)-like diffs that clients can
fetch without the need to transfer the whole consensus for each minor
change.
The subsystem also takes care of compression, which includes LZMA, which
is a beast when it comes to burning CPU cycles.
...
No abnormal entries in the log files.
I suspect you're logging at `notice` log-level, which is the reasonable
thing to do. We need to log at slightly higher granularity to discover
the problem here.
Could I get you to add `Log [dirserv]info notice syslog` to your
`torrc`? This line makes Tor log everything at notice log-level (the
default), to the system logger, except for the directory server
subsystem, which will be logged at `info` log-level instead. The code
responsible for generating consensus diffs uses the `dirserv` for
logging purposes.
If the CPU spike happens right after a log message that says something
in the line of "The most recent XXX consensus is valid-after XXX. We
have diffs to this consensus for XXX/XXX older XXX consensuses.
Generating diffs for the other XXX." then I think we have our winner.
Please remember to remove the `info` log-level when the experiment is
over :-)
I'm curious what you figure out here. Let me know if you need any help.
All the best,
Alex.
--
Alexander Færøy
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Re: [tor-relays] Tor relay occasionally maxing out CPU usage

Matt Traudt