Hi there,
I am the operator of the following relay:
https://metrics.torproject.org/rs.html#details/47E1157F7DA6DF80EC00D745D73AC...
The relay is running on my Arch Linux server running kernel version 5.6.11.
This is my tor configuration file:
ORPort 37.157.195.83:38619 ORPort [2a02:2b88:2:1::3239:0]:38619 DirPort 37.157.195.83:44776 Nickname michaelscott ContactInfo ttallink@googlemail.com ControlPort 9051 SocksPort 0 CookieAuthentication 1 ExitPolicy reject *:* DataDirectory /var/lib/tor Sandbox 1
Linux kernel boot parameters from grub:
quiet mitigations=off
Kernel parameters from /etc/sysctl.d set on boot through systemd:
kernel.dmesg_restrict = 1 net.ipv6.ip_nonlocal_bind = 1 kernel.yama.ptrace_scope = 3 vm.swappiness = 60
Tor systemd unit (shipped by distribution):
[Unit] Description=Anonymizing Overlay Network After=network.target
[Service] User=tor Type=simple ExecStart=/usr/bin/tor -f /etc/tor/torrc ExecReload=/usr/bin/kill -HUP $MAINPID KillSignal=SIGINT LimitNOFILE=8192 PrivateDevices=yes
[Install] WantedBy=multi-user.target
Tor systemd unit overrides:
[Service] ProtectSystem=strict ProtectHome=true PrivateTmp=true ProtectKernelLogs=true ProtectKernelModules=true ProtectKernelTunables=true ProtectControlGroups=true NoNewPrivileges=true RestrictSUIDSGID=true RestrictAddressFamilies=AF_INET AF_INET6 ReadWritePaths=/var/lib/tor
Occasionally, the CPU usage hit's 100%, and the maximum throughput drops down to around 16 Mbps from it's usual 80 Mbps. This happens randomly and not a fixed intervals which makes it pretty hard to profile.
No abnormal entries in the log files.
I found ticket #24857 in which someone describes a similar behavior, but on _Windows_.
https://trac.torproject.org/projects/tor/ticket/24857
Is this also an issue on Linux?
In that case, setting DirCache to 0 should fix the issue, however that would mean that, according to the manual, I would no longer be able to mirror directory information.
If anyone else encountered the same problem and found a solution, please let me know.
Best Regards, William
Not at fixed intervals*, sorry for the typo.
William
2020-05-17 18:20 GMT, William Kane ttallink@googlemail.com:
Hi there,
I am the operator of the following relay:
https://metrics.torproject.org/rs.html#details/47E1157F7DA6DF80EC00D745D73AC...
The relay is running on my Arch Linux server running kernel version 5.6.11.
This is my tor configuration file:
ORPort 37.157.195.83:38619 ORPort [2a02:2b88:2:1::3239:0]:38619 DirPort 37.157.195.83:44776 Nickname michaelscott ContactInfo ttallink@googlemail.com ControlPort 9051 SocksPort 0 CookieAuthentication 1 ExitPolicy reject *:* DataDirectory /var/lib/tor Sandbox 1
Linux kernel boot parameters from grub:
quiet mitigations=off
Kernel parameters from /etc/sysctl.d set on boot through systemd:
kernel.dmesg_restrict = 1 net.ipv6.ip_nonlocal_bind = 1 kernel.yama.ptrace_scope = 3 vm.swappiness = 60
Tor systemd unit (shipped by distribution):
[Unit] Description=Anonymizing Overlay Network After=network.target
[Service] User=tor Type=simple ExecStart=/usr/bin/tor -f /etc/tor/torrc ExecReload=/usr/bin/kill -HUP $MAINPID KillSignal=SIGINT LimitNOFILE=8192 PrivateDevices=yes
[Install] WantedBy=multi-user.target
Tor systemd unit overrides:
[Service] ProtectSystem=strict ProtectHome=true PrivateTmp=true ProtectKernelLogs=true ProtectKernelModules=true ProtectKernelTunables=true ProtectControlGroups=true NoNewPrivileges=true RestrictSUIDSGID=true RestrictAddressFamilies=AF_INET AF_INET6 ReadWritePaths=/var/lib/tor
Occasionally, the CPU usage hit's 100%, and the maximum throughput drops down to around 16 Mbps from it's usual 80 Mbps. This happens randomly and not a fixed intervals which makes it pretty hard to profile.
No abnormal entries in the log files.
I found ticket #24857 in which someone describes a similar behavior, but on _Windows_.
https://trac.torproject.org/projects/tor/ticket/24857
Is this also an issue on Linux?
In that case, setting DirCache to 0 should fix the issue, however that would mean that, according to the manual, I would no longer be able to mirror directory information.
If anyone else encountered the same problem and found a solution, please let me know.
Best Regards, William
Hello,
On 2020/05/17 18:20, William Kane wrote:
Occasionally, the CPU usage hit's 100%, and the maximum throughput drops down to around 16 Mbps from it's usual 80 Mbps. This happens randomly and not a fixed intervals which makes it pretty hard to profile.
One of the subsystem's that I can think of that could potentially lead to the problem that you are describing is our "consensus diff" subsystem. The consensus diff subsystem is responsible for turning consensus documents into these patch(1)-like diffs that clients can fetch without the need to transfer the whole consensus for each minor change.
The subsystem also takes care of compression, which includes LZMA, which is a beast when it comes to burning CPU cycles.
No abnormal entries in the log files.
I suspect you're logging at `notice` log-level, which is the reasonable thing to do. We need to log at slightly higher granularity to discover the problem here.
Could I get you to add `Log [dirserv]info notice syslog` to your `torrc`? This line makes Tor log everything at notice log-level (the default), to the system logger, except for the directory server subsystem, which will be logged at `info` log-level instead. The code responsible for generating consensus diffs uses the `dirserv` for logging purposes.
If the CPU spike happens right after a log message that says something in the line of "The most recent XXX consensus is valid-after XXX. We have diffs to this consensus for XXX/XXX older XXX consensuses. Generating diffs for the other XXX." then I think we have our winner.
Please remember to remove the `info` log-level when the experiment is over :-)
I'm curious what you figure out here. Let me know if you need any help.
All the best, Alex.
Dear Alexander,
I have added 'Log [dirserv]info notice stdout' to my configuration and will be monitoring the system closely.
Tor was also upgraded to version 0.4.3.5, and the linux kernel was upgraded to version 5.6.13 but I do not think this will change anything.
Expect a follow-up within the next 12 hours.
William
2020-05-18 1:40 GMT, Alexander Færøy ahf@torproject.org:
Hello,
On 2020/05/17 18:20, William Kane wrote:
Occasionally, the CPU usage hit's 100%, and the maximum throughput drops down to around 16 Mbps from it's usual 80 Mbps. This happens randomly and not a fixed intervals which makes it pretty hard to profile.
One of the subsystem's that I can think of that could potentially lead to the problem that you are describing is our "consensus diff" subsystem. The consensus diff subsystem is responsible for turning consensus documents into these patch(1)-like diffs that clients can fetch without the need to transfer the whole consensus for each minor change.
The subsystem also takes care of compression, which includes LZMA, which is a beast when it comes to burning CPU cycles.
No abnormal entries in the log files.
I suspect you're logging at `notice` log-level, which is the reasonable thing to do. We need to log at slightly higher granularity to discover the problem here.
Could I get you to add `Log [dirserv]info notice syslog` to your `torrc`? This line makes Tor log everything at notice log-level (the default), to the system logger, except for the directory server subsystem, which will be logged at `info` log-level instead. The code responsible for generating consensus diffs uses the `dirserv` for logging purposes.
If the CPU spike happens right after a log message that says something in the line of "The most recent XXX consensus is valid-after XXX. We have diffs to this consensus for XXX/XXX older XXX consensuses. Generating diffs for the other XXX." then I think we have our winner.
Please remember to remove the `info` log-level when the experiment is over :-)
I'm curious what you figure out here. Let me know if you need any help.
All the best, Alex.
-- Alexander Færøy _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Another thing, from the change-log:
- Update the message logged on relays when DirCache is disabled. Since 0.3.3.5-rc, authorities require DirCache (V2Dir) for the Guard flag. Fixes bug 24312; bugfix on 0.3.3.5-rc.
If I understand this correctly, my relay would no longer be a Guard if I choose to disable DirCache in order to prevent Tor from hogging my CPU?
From the code that I have seen, simply not setting the directory port
does not stop the relay from caching / compressing diffs.
Or has this been changed more recently?
Not being a guard would honestly suck, and being a guard but with limited bandwidth due to Tor hogging the CPU also sucks.
Any ideas on what to do?
2020-05-19 13:43 GMT, William Kane ttallink@googlemail.com:
Dear Alexander,
I have added 'Log [dirserv]info notice stdout' to my configuration and will be monitoring the system closely.
Tor was also upgraded to version 0.4.3.5, and the linux kernel was upgraded to version 5.6.13 but I do not think this will change anything.
Expect a follow-up within the next 12 hours.
William
2020-05-18 1:40 GMT, Alexander Færøy ahf@torproject.org:
Hello,
On 2020/05/17 18:20, William Kane wrote:
Occasionally, the CPU usage hit's 100%, and the maximum throughput drops down to around 16 Mbps from it's usual 80 Mbps. This happens randomly and not a fixed intervals which makes it pretty hard to profile.
One of the subsystem's that I can think of that could potentially lead to the problem that you are describing is our "consensus diff" subsystem. The consensus diff subsystem is responsible for turning consensus documents into these patch(1)-like diffs that clients can fetch without the need to transfer the whole consensus for each minor change.
The subsystem also takes care of compression, which includes LZMA, which is a beast when it comes to burning CPU cycles.
No abnormal entries in the log files.
I suspect you're logging at `notice` log-level, which is the reasonable thing to do. We need to log at slightly higher granularity to discover the problem here.
Could I get you to add `Log [dirserv]info notice syslog` to your `torrc`? This line makes Tor log everything at notice log-level (the default), to the system logger, except for the directory server subsystem, which will be logged at `info` log-level instead. The code responsible for generating consensus diffs uses the `dirserv` for logging purposes.
If the CPU spike happens right after a log message that says something in the line of "The most recent XXX consensus is valid-after XXX. We have diffs to this consensus for XXX/XXX older XXX consensuses. Generating diffs for the other XXX." then I think we have our winner.
Please remember to remove the `info` log-level when the experiment is over :-)
I'm curious what you figure out here. Let me know if you need any help.
All the best, Alex.
-- Alexander Færøy _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Okay, so your suspicion was just confirmed:
consdiffmgr_rescan_flavor_(): The most recent ns consensus is valid-after 2020-05-19T15:00:00. We have diffs to this consensus for 0/25 older ns consensuses. Generating diffs for the other 25.
Right after, diffs were compressed with zstd and lzma, causing the CPU usage to spike.
Disabling DirCache still gives me the following warning on Tor 0.4.3.5:
May 19 17:56:42.909 [warn] DirCache is disabled and we are configured as a relay. We will not become a Guard.
So, unless I sacrifice the Guard flag, there doesn't seem to be a way to fix this problem in an easy way.
Please correct me if I'm wrong.
2020-05-19 15:07 GMT, William Kane ttallink@googlemail.com:
Another thing, from the change-log:
- Update the message logged on relays when DirCache is disabled. Since 0.3.3.5-rc, authorities require DirCache (V2Dir) for the Guard flag. Fixes bug 24312; bugfix on 0.3.3.5-rc.
If I understand this correctly, my relay would no longer be a Guard if I choose to disable DirCache in order to prevent Tor from hogging my CPU?
From the code that I have seen, simply not setting the directory port does not stop the relay from caching / compressing diffs.
Or has this been changed more recently?
Not being a guard would honestly suck, and being a guard but with limited bandwidth due to Tor hogging the CPU also sucks.
Any ideas on what to do?
2020-05-19 13:43 GMT, William Kane ttallink@googlemail.com:
Dear Alexander,
I have added 'Log [dirserv]info notice stdout' to my configuration and will be monitoring the system closely.
Tor was also upgraded to version 0.4.3.5, and the linux kernel was upgraded to version 5.6.13 but I do not think this will change anything.
Expect a follow-up within the next 12 hours.
William
2020-05-18 1:40 GMT, Alexander Færøy ahf@torproject.org:
Hello,
On 2020/05/17 18:20, William Kane wrote:
Occasionally, the CPU usage hit's 100%, and the maximum throughput drops down to around 16 Mbps from it's usual 80 Mbps. This happens randomly and not a fixed intervals which makes it pretty hard to profile.
One of the subsystem's that I can think of that could potentially lead to the problem that you are describing is our "consensus diff" subsystem. The consensus diff subsystem is responsible for turning consensus documents into these patch(1)-like diffs that clients can fetch without the need to transfer the whole consensus for each minor change.
The subsystem also takes care of compression, which includes LZMA, which is a beast when it comes to burning CPU cycles.
No abnormal entries in the log files.
I suspect you're logging at `notice` log-level, which is the reasonable thing to do. We need to log at slightly higher granularity to discover the problem here.
Could I get you to add `Log [dirserv]info notice syslog` to your `torrc`? This line makes Tor log everything at notice log-level (the default), to the system logger, except for the directory server subsystem, which will be logged at `info` log-level instead. The code responsible for generating consensus diffs uses the `dirserv` for logging purposes.
If the CPU spike happens right after a log message that says something in the line of "The most recent XXX consensus is valid-after XXX. We have diffs to this consensus for XXX/XXX older XXX consensuses. Generating diffs for the other XXX." then I think we have our winner.
Please remember to remove the `info` log-level when the experiment is over :-)
I'm curious what you figure out here. Let me know if you need any help.
All the best, Alex.
-- Alexander Færøy _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
To me it sounds like there isn't actually a problem. This is the way Tor works now (now == since consensus diffs were added). It's unfortunate that Tor isn't more multithreaded, so much happens in the same main loop, and client throughput is momentarily impacted, but that's the way it is and there isn't a problem here to be solved. At least not for you the relay operator.
Getting more into tor-dev@ territory here, but doesn't compressing consensus documents sound like something that could easily be shoved over into a worker thread? I'm unfamiliar with the subsystem and I'm sure many of my implicit assumptions are wrong.
Matt
On 5/19/20 11:59, William Kane wrote:
Okay, so your suspicion was just confirmed:
consdiffmgr_rescan_flavor_(): The most recent ns consensus is valid-after 2020-05-19T15:00:00. We have diffs to this consensus for 0/25 older ns consensuses. Generating diffs for the other 25.
Right after, diffs were compressed with zstd and lzma, causing the CPU usage to spike.
Disabling DirCache still gives me the following warning on Tor 0.4.3.5:
May 19 17:56:42.909 [warn] DirCache is disabled and we are configured as a relay. We will not become a Guard.
So, unless I sacrifice the Guard flag, there doesn't seem to be a way to fix this problem in an easy way.
Please correct me if I'm wrong.
2020-05-19 15:07 GMT, William Kane ttallink@googlemail.com:
Another thing, from the change-log:
- Update the message logged on relays when DirCache is disabled. Since 0.3.3.5-rc, authorities require DirCache (V2Dir) for the Guard flag. Fixes bug 24312; bugfix on 0.3.3.5-rc.
If I understand this correctly, my relay would no longer be a Guard if I choose to disable DirCache in order to prevent Tor from hogging my CPU?
From the code that I have seen, simply not setting the directory port does not stop the relay from caching / compressing diffs.
Or has this been changed more recently?
Not being a guard would honestly suck, and being a guard but with limited bandwidth due to Tor hogging the CPU also sucks.
Any ideas on what to do?
2020-05-19 13:43 GMT, William Kane ttallink@googlemail.com:
Dear Alexander,
I have added 'Log [dirserv]info notice stdout' to my configuration and will be monitoring the system closely.
Tor was also upgraded to version 0.4.3.5, and the linux kernel was upgraded to version 5.6.13 but I do not think this will change anything.
Expect a follow-up within the next 12 hours.
William
2020-05-18 1:40 GMT, Alexander Færøy ahf@torproject.org:
Hello,
On 2020/05/17 18:20, William Kane wrote:
Occasionally, the CPU usage hit's 100%, and the maximum throughput drops down to around 16 Mbps from it's usual 80 Mbps. This happens randomly and not a fixed intervals which makes it pretty hard to profile.
One of the subsystem's that I can think of that could potentially lead to the problem that you are describing is our "consensus diff" subsystem. The consensus diff subsystem is responsible for turning consensus documents into these patch(1)-like diffs that clients can fetch without the need to transfer the whole consensus for each minor change.
The subsystem also takes care of compression, which includes LZMA, which is a beast when it comes to burning CPU cycles.
No abnormal entries in the log files.
I suspect you're logging at `notice` log-level, which is the reasonable thing to do. We need to log at slightly higher granularity to discover the problem here.
Could I get you to add `Log [dirserv]info notice syslog` to your `torrc`? This line makes Tor log everything at notice log-level (the default), to the system logger, except for the directory server subsystem, which will be logged at `info` log-level instead. The code responsible for generating consensus diffs uses the `dirserv` for logging purposes.
If the CPU spike happens right after a log message that says something in the line of "The most recent XXX consensus is valid-after XXX. We have diffs to this consensus for XXX/XXX older XXX consensuses. Generating diffs for the other XXX." then I think we have our winner.
Please remember to remove the `info` log-level when the experiment is over :-)
I'm curious what you figure out here. Let me know if you need any help.
All the best, Alex.
-- Alexander Færøy _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
It sure is a problem for those on virtualized machines with only a single core.
As far as offloading to a different worker thread goes, it should be very easy to implement code wise, Tor already does off-load some crypto stuff to a different thread when NumCPU's is set / detected appropriately.
2020-05-20 12:24 GMT, Matt Traudt pastly@torproject.org:
To me it sounds like there isn't actually a problem. This is the way Tor works now (now == since consensus diffs were added). It's unfortunate that Tor isn't more multithreaded, so much happens in the same main loop, and client throughput is momentarily impacted, but that's the way it is and there isn't a problem here to be solved. At least not for you the relay operator.
Getting more into tor-dev@ territory here, but doesn't compressing consensus documents sound like something that could easily be shoved over into a worker thread? I'm unfamiliar with the subsystem and I'm sure many of my implicit assumptions are wrong.
Matt
On 5/19/20 11:59, William Kane wrote:
Okay, so your suspicion was just confirmed:
consdiffmgr_rescan_flavor_(): The most recent ns consensus is valid-after 2020-05-19T15:00:00. We have diffs to this consensus for 0/25 older ns consensuses. Generating diffs for the other 25.
Right after, diffs were compressed with zstd and lzma, causing the CPU usage to spike.
Disabling DirCache still gives me the following warning on Tor 0.4.3.5:
May 19 17:56:42.909 [warn] DirCache is disabled and we are configured as a relay. We will not become a Guard.
So, unless I sacrifice the Guard flag, there doesn't seem to be a way to fix this problem in an easy way.
Please correct me if I'm wrong.
2020-05-19 15:07 GMT, William Kane ttallink@googlemail.com:
Another thing, from the change-log:
- Update the message logged on relays when DirCache is disabled. Since 0.3.3.5-rc, authorities require DirCache (V2Dir) for the Guard flag. Fixes bug 24312; bugfix on 0.3.3.5-rc.
If I understand this correctly, my relay would no longer be a Guard if I choose to disable DirCache in order to prevent Tor from hogging my CPU?
From the code that I have seen, simply not setting the directory port does not stop the relay from caching / compressing diffs.
Or has this been changed more recently?
Not being a guard would honestly suck, and being a guard but with limited bandwidth due to Tor hogging the CPU also sucks.
Any ideas on what to do?
2020-05-19 13:43 GMT, William Kane ttallink@googlemail.com:
Dear Alexander,
I have added 'Log [dirserv]info notice stdout' to my configuration and will be monitoring the system closely.
Tor was also upgraded to version 0.4.3.5, and the linux kernel was upgraded to version 5.6.13 but I do not think this will change anything.
Expect a follow-up within the next 12 hours.
William
2020-05-18 1:40 GMT, Alexander Færøy ahf@torproject.org:
Hello,
On 2020/05/17 18:20, William Kane wrote:
Occasionally, the CPU usage hit's 100%, and the maximum throughput drops down to around 16 Mbps from it's usual 80 Mbps. This happens randomly and not a fixed intervals which makes it pretty hard to profile.
One of the subsystem's that I can think of that could potentially lead to the problem that you are describing is our "consensus diff" subsystem. The consensus diff subsystem is responsible for turning consensus documents into these patch(1)-like diffs that clients can fetch without the need to transfer the whole consensus for each minor change.
The subsystem also takes care of compression, which includes LZMA, which is a beast when it comes to burning CPU cycles.
No abnormal entries in the log files.
I suspect you're logging at `notice` log-level, which is the reasonable thing to do. We need to log at slightly higher granularity to discover the problem here.
Could I get you to add `Log [dirserv]info notice syslog` to your `torrc`? This line makes Tor log everything at notice log-level (the default), to the system logger, except for the directory server subsystem, which will be logged at `info` log-level instead. The code responsible for generating consensus diffs uses the `dirserv` for logging purposes.
If the CPU spike happens right after a log message that says something in the line of "The most recent XXX consensus is valid-after XXX. We have diffs to this consensus for XXX/XXX older XXX consensuses. Generating diffs for the other XXX." then I think we have our winner.
Please remember to remove the `info` log-level when the experiment is over :-)
I'm curious what you figure out here. Let me know if you need any help.
All the best, Alex.
-- Alexander Færøy _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On 2020/05/19 15:59, William Kane wrote:
Right after, diffs were compressed with zstd and lzma, causing the CPU usage to spike.
Thank you for debugging this William.
Tor behaves in the way it is designed to here. Tor uses a number of worker threads to handle compression (and a couple of other tasks), but what worries me is how big an impact it has on the traffic processing of your relay during the time where your relay is also compressing.
I'm a bit curious what the specs are of your relays here -- especially CPU and memory specs?
Disabling DirCache still gives me the following warning on Tor 0.4.3.5:
May 19 17:56:42.909 [warn] DirCache is disabled and we are configured as a relay. We will not become a Guard.
So, unless I sacrifice the Guard flag, there doesn't seem to be a way to fix this problem in an easy way.
This is correct for now. Tor have the `NumCPUs` configuration entry, which defines how many workers we can spawn, but the default value is sensible for most systems and I doubt it makes sense to tune this for you.
Please correct me if I'm wrong.
You're right.
All the best, Alex.
Hi Alexander,
I am a customer of Wedos Internet, and originally ordered this virtual machine back in 2014, as far as I know no hardware updates to the hypervisor were ever performed, so it's likely some older Intel Xeon (clocked at 2133.408MHz), guess with that information you can find the exact CPU model.
Here's the output from 'lscpu':
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 40 bits physical, 48 bits virtual CPU(s): 1 On-line CPU(s) list: 0 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 13 Model name: QEMU Virtual CPU version (cpu64-rhel6) Stepping: 3 CPU MHz: 2133.408 BogoMIPS: 4268.60 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32 KiB L1i cache: 32 KiB L2 cache: 4 MiB NUMA node0 CPU(s): 0 Vulnerability Itlb multihit: KVM: Vulnerable Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable; SMT Host state unknown Vulnerability Meltdown: Vulnerable Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, STIBP: disabled Vulnerability Tsx async abort: Not affected Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm nopl cpuid tsc_known_freq pni cx16 hypervisor lahf_lm
Notice the lack of AES instructions, despite them being supported by the host cpu - I previously asked them reconfigure their KVM configuration to set the emulated CPU model to 'host' so I can benefit from hardware AES-NI acceleration but they refused even though this would help to reduce CPU load, improve throughput and make headroom for non-crypto operations (such as the diffing / compression of consensus documents which hogs my CPU for multiple minutes).
Further specs of the VM:
1024MB of RAM 256MB Swap
Memory wise, no problems at all, the tor process doesn't utilize more than 600 MB's even under maximum load, and the base system only utilizes ~60MB so it's not a ram bottleneck.
I thought about re-writing the code responsible for compression to make it use the least CPU intensive compression level.
If anyone is familiar with the code responsible for it, let me know if my attempts are going to be futile (I have 10+ years of experience with C/C++, just not with the Tor code base except for very small parts of it.)
William
2020-05-20 13:06 GMT, Alexander Færøy ahf@torproject.org:
On 2020/05/19 15:59, William Kane wrote:
Right after, diffs were compressed with zstd and lzma, causing the CPU usage to spike.
Thank you for debugging this William.
Tor behaves in the way it is designed to here. Tor uses a number of worker threads to handle compression (and a couple of other tasks), but what worries me is how big an impact it has on the traffic processing of your relay during the time where your relay is also compressing.
I'm a bit curious what the specs are of your relays here -- especially CPU and memory specs?
Disabling DirCache still gives me the following warning on Tor 0.4.3.5:
May 19 17:56:42.909 [warn] DirCache is disabled and we are configured as a relay. We will not become a Guard.
So, unless I sacrifice the Guard flag, there doesn't seem to be a way to fix this problem in an easy way.
This is correct for now. Tor have the `NumCPUs` configuration entry, which defines how many workers we can spawn, but the default value is sensible for most systems and I doubt it makes sense to tune this for you.
Please correct me if I'm wrong.
You're right.
All the best, Alex.
-- Alexander Færøy _______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
tor-relays@lists.torproject.org