Okay, so your suspicion was just confirmed:
consdiffmgr_rescan_flavor_(): The most recent ns consensus is valid-after 2020-05-19T15:00:00. We have diffs to this consensus for 0/25 older ns consensuses. Generating diffs for the other 25.
Right after, diffs were compressed with zstd and lzma, causing the CPU usage to spike.
Disabling DirCache still gives me the following warning on Tor 0.4.3.5:
May 19 17:56:42.909 [warn] DirCache is disabled and we are configured as a relay. We will not become a Guard.
So, unless I sacrifice the Guard flag, there doesn't seem to be a way to fix this problem in an easy way.
Please correct me if I'm wrong.
2020-05-19 15:07 GMT, William Kane ttallink@googlemail.com:
Another thing, from the change-log:
- Update the message logged on relays when DirCache is disabled. Since 0.3.3.5-rc, authorities require DirCache (V2Dir) for the Guard flag. Fixes bug 24312; bugfix on 0.3.3.5-rc.
If I understand this correctly, my relay would no longer be a Guard if I choose to disable DirCache in order to prevent Tor from hogging my CPU?
From the code that I have seen, simply not setting the directory port does not stop the relay from caching / compressing diffs.
Or has this been changed more recently?
Not being a guard would honestly suck, and being a guard but with limited bandwidth due to Tor hogging the CPU also sucks.
Any ideas on what to do?
2020-05-19 13:43 GMT, William Kane ttallink@googlemail.com:
Dear Alexander,
I have added 'Log [dirserv]info notice stdout' to my configuration and will be monitoring the system closely.
Tor was also upgraded to version 0.4.3.5, and the linux kernel was upgraded to version 5.6.13 but I do not think this will change anything.
Expect a follow-up within the next 12 hours.
William
2020-05-18 1:40 GMT, Alexander Færøy ahf@torproject.org:
Hello,
On 2020/05/17 18:20, William Kane wrote:
Occasionally, the CPU usage hit's 100%, and the maximum throughput drops down to around 16 Mbps from it's usual 80 Mbps. This happens randomly and not a fixed intervals which makes it pretty hard to profile.
One of the subsystem's that I can think of that could potentially lead to the problem that you are describing is our "consensus diff" subsystem. The consensus diff subsystem is responsible for turning consensus documents into these patch(1)-like diffs that clients can fetch without the need to transfer the whole consensus for each minor change.
The subsystem also takes care of compression, which includes LZMA, which is a beast when it comes to burning CPU cycles.
No abnormal entries in the log files.
I suspect you're logging at `notice` log-level, which is the reasonable thing to do. We need to log at slightly higher granularity to discover the problem here.
Could I get you to add `Log [dirserv]info notice syslog` to your `torrc`? This line makes Tor log everything at notice log-level (the default), to the system logger, except for the directory server subsystem, which will be logged at `info` log-level instead. The code responsible for generating consensus diffs uses the `dirserv` for logging purposes.
If the CPU spike happens right after a log message that says something in the line of "The most recent XXX consensus is valid-after XXX. We have diffs to this consensus for XXX/XXX older XXX consensuses. Generating diffs for the other XXX." then I think we have our winner.
Please remember to remove the `info` log-level when the experiment is over :-)
I'm curious what you figure out here. Let me know if you need any help.
All the best, Alex.
-- Alexander Færøy _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays