Re: [tor-relays] Tor relay occasionally maxing out CPU usage

18 May 2020

      Hello,

On 2020/05/17 18:20, William Kane wrote:
...
Occasionally, the CPU usage hit's 100%, and the maximum throughput
drops down to around 16 Mbps from it's usual 80 Mbps. This happens
randomly and not a fixed intervals which makes it pretty hard to
profile.
One of the subsystem's that I can think of that could potentially lead
to the problem that you are describing is our "consensus diff"
subsystem. The consensus diff subsystem is responsible for turning
consensus documents into these patch(1)-like diffs that clients can
fetch without the need to transfer the whole consensus for each minor
change.

The subsystem also takes care of compression, which includes LZMA, which
is a beast when it comes to burning CPU cycles.
...
No abnormal entries in the log files.
I suspect you're logging at `notice` log-level, which is the reasonable
thing to do. We need to log at slightly higher granularity to discover
the problem here.

Could I get you to add `Log [dirserv]info notice syslog` to your
`torrc`? This line makes Tor log everything at notice log-level (the
default), to the system logger, except for the directory server
subsystem, which will be logged at `info` log-level instead. The code
responsible for generating consensus diffs uses the `dirserv` for
logging purposes.

If the CPU spike happens right after a log message that says something
in the line of "The most recent XXX consensus is valid-after XXX. We
have diffs to this consensus for XXX/XXX older XXX consensuses.
Generating diffs for the other XXX." then I think we have our winner.

Please remember to remove the `info` log-level when the experiment is
over :-)

I'm curious what you figure out here. Let me know if you need any help.

All the best,
Alex.

-- 
Alexander Færøy