Hello,
On 2020/05/17 18:20, William Kane wrote:
Occasionally, the CPU usage hit's 100%, and the maximum throughput drops down to around 16 Mbps from it's usual 80 Mbps. This happens randomly and not a fixed intervals which makes it pretty hard to profile.
One of the subsystem's that I can think of that could potentially lead to the problem that you are describing is our "consensus diff" subsystem. The consensus diff subsystem is responsible for turning consensus documents into these patch(1)-like diffs that clients can fetch without the need to transfer the whole consensus for each minor change.
The subsystem also takes care of compression, which includes LZMA, which is a beast when it comes to burning CPU cycles.
No abnormal entries in the log files.
I suspect you're logging at `notice` log-level, which is the reasonable thing to do. We need to log at slightly higher granularity to discover the problem here.
Could I get you to add `Log [dirserv]info notice syslog` to your `torrc`? This line makes Tor log everything at notice log-level (the default), to the system logger, except for the directory server subsystem, which will be logged at `info` log-level instead. The code responsible for generating consensus diffs uses the `dirserv` for logging purposes.
If the CPU spike happens right after a log message that says something in the line of "The most recent XXX consensus is valid-after XXX. We have diffs to this consensus for XXX/XXX older XXX consensuses. Generating diffs for the other XXX." then I think we have our winner.
Please remember to remove the `info` log-level when the experiment is over :-)
I'm curious what you figure out here. Let me know if you need any help.
All the best, Alex.