Hi,
I'm troubleshooting a Linux relay where the Tor service is having problems. External monitoring alerts indicate both the ORPort and DirPort are unreachable (TCP connection timeout). I can ssh in and the Tor service is still running. The node seems to have increased memory usage at this point but there's no evidence of OOM. I restart the Tor service, monitoring says all is good again, and things seem fine for a bit, until the cycle repeats hours later.
I'm still investigating, but one thing I immediately noticed was hundreds of these lines in the logs:
[warn] circuit_mark_for_close_(): Bug: Duplicate call to circuit_mark_for_close at ../src/or/onion.c:238 (first at ../src/or/command.c:579) (on Tor 0.2.9.11 )
I found https://trac.torproject.org/projects/tor/ticket/20059 but it's marked as fixed with a backport to 0.2.9.
Any thoughts?
tor wrote:
Hi,
I'm troubleshooting a Linux relay where the Tor service is having problems. External monitoring alerts indicate both the ORPort and DirPort are unreachable (TCP connection timeout). I can ssh in and the Tor service is still running. The node seems to have increased memory usage at this point but there's no evidence of OOM. I restart the Tor service, monitoring says all is good again, and things seem fine for a bit, until the cycle repeats hours later.
I'm still investigating, but one thing I immediately noticed was hundreds of these lines in the logs:
[warn] circuit_mark_for_close_(): Bug: Duplicate call to circuit_mark_for_close at ../src/or/onion.c:238 (first at ../src/or/command.c:579) (on Tor 0.2.9.11 )
I found https://trac.torproject.org/projects/tor/ticket/20059%C2%A0but it's marked as fixed with a backport to 0.2.9.
Any thoughts?
Hello,
Thanks for running a relay.
You are on 0.2.9.11 and #20059 was merged in 0.2.9.12 https://gitweb.torproject.org/tor.git/tree/ReleaseNotes?h=release-0.2.9
There is no sense to report this further because the issue is fixed, you are just one release behind.
As for the relay, I am pretty sure there is a firewall or something which throttles the incoming / outgoing TCP connection a process/user/pid can initiate or something like this. The problem is either in the operating system itself either a network-level firewall or built-in router firewall.
You are on 0.2.9.11 and #20059 was merged in 0.2.9.12 https://gitweb.torproject.org/tor.git/tree/ReleaseNotes?h=release-0.2.9
I see. I'm trying to stay on 0.2.9.x since that is considered the "long-term support" release. This is a fallback directory mirror which I'd like to keep as stable as possible. apt wants to upgrade straight to 0.3.1.7 (from the repo at http://deb.torproject.org/torproject.org). I will see if I can install 0.2.9.12 from the repo instead, or perhaps install the package manually (or perhaps give up and switch to 0.3.1.7).
As for the relay, I am pretty sure there is a firewall or something which throttles the incoming / outgoing TCP connection a process/user/pid can initiate or something like this. The problem is either in the operating system itself either a network-level firewall or built-in router firewall.
Could be. It's just simple iptables on the node, and I've tried to follow best practices for the sysctl and ulimit tweaks, but I don't really know what's going on upstream with the provider. It's a little odd that this is only a recent problem, as the node has been up for 700+ days and aside from kernel upgrades, there's no recent changes. Maybe it's just busier than usual now. I'll keep digging. Thanks for the feedback!
On 17 Oct 2017, at 17:02, tor tor@anondroid.com wrote:
You are on 0.2.9.11 and #20059 was merged in 0.2.9.12 https://gitweb.torproject.org/tor.git/tree/ReleaseNotes?h=release-0.2.9
I see. I'm trying to stay on 0.2.9.x since that is considered the "long-term support" release. This is a fallback directory mirror which I'd like to keep as stable as possible. apt wants to upgrade straight to 0.3.1.7 (from the repo at http://deb.torproject.org/torproject.org). I will see if I can install 0.2.9.12 from the repo instead, or perhaps install the package manually (or perhaps give up and switch to 0.3.1.7).
I think Tor LTS / 0.2.9 is in Debian stable:
http://deb.torproject.org/torproject.org/dists/stable/
I've opened a ticket to add LTS to the Debian repository instructions:
https://trac.torproject.org/projects/tor/ticket/23897
I wouldn't recommend upgrading to 0.3.0 or later, there are stability issues on some clients, and maybe relays.
https://trac.torproject.org/projects/tor/ticket/21969
As for the relay, I am pretty sure there is a firewall or something which throttles the incoming / outgoing TCP connection a process/user/pid can initiate or something like this. The problem is either in the operating system itself either a network-level firewall or built-in router firewall.
Could be. It's just simple iptables on the node, and I've tried to follow best practices for the sysctl and ulimit tweaks, but I don't really know what's going on upstream with the provider. It's a little odd that this is only a recent problem, as the node has been up for 700+ days and aside from kernel upgrades, there's no recent changes. Maybe it's just busier than usual now. I'll keep digging. Thanks for the feedback!
There's a bug in 0.3.0 and later that causes clients to fetch microdescriptors from fallbacks. So fallbacks (and authorities) will have extra load until that's fixed.
https://trac.torproject.org/projects/tor/ticket/23862
T
There's a bug in 0.3.0 and later that causes clients to fetch microdescriptors from fallbacks. So fallbacks (and authorities) will have extra load until that's fixed.
Makes sense. The relay can't keep up with the extra load. It's basically a DDoS. It's gone into this state 4 times over the past ~ 48 hours.
I think Tor LTS / 0.2.9 is in Debian stable: http://deb.torproject.org/torproject.org/dists/stable/
I've opened a ticket to add LTS to the Debian repository instructions: https://trac.torproject.org/projects/tor/ticket/23897
I wouldn't recommend upgrading to 0.3.0 or later, there are stability issues on some clients, and maybe relays. https://trac.torproject.org/projects/tor/ticket/21969
Thanks the info. Unfortunately I upgraded to 0.3.1.7 before reading this (it didn't help), and can't figure out how to obtain 0.2.9.12 from the repos. I've tried these repos:
deb http://deb.torproject.org/torproject.org trusty main deb http://deb.torproject.org/torproject.org jessie main deb http://deb.torproject.org/torproject.org stretch main
All of them seem to only offer 0.3.1.7, but I'm not sure I'm looking in the right places or querying apt in the right way.
A static link to a signed dpkg (for 0.2.9.12) would be fine for the moment, if anyone knows of one.
Thanks.
On 17 Oct 2017, at 21:43, tor tor@anondroid.com wrote:
There's a bug in 0.3.0 and later that causes clients to fetch microdescriptors from fallbacks. So fallbacks (and authorities) will have extra load until that's fixed.
Makes sense. The relay can't keep up with the extra load. It's basically a DDoS. It's gone into this state 4 times over the past ~ 48 hours.
I doubt this bug is the cause if it's just happened recently. It's more likely that your relay is the HSDir for some popular onion service. Or a genuine DDoS.
Can't your provider support that many connections?
I think Tor LTS / 0.2.9 is in Debian stable: http://deb.torproject.org/torproject.org/dists/stable/
I've opened a ticket to add LTS to the Debian repository instructions: https://trac.torproject.org/projects/tor/ticket/23897
I wouldn't recommend upgrading to 0.3.0 or later, there are stability issues on some clients, and maybe relays. https://trac.torproject.org/projects/tor/ticket/21969
Thanks the info. Unfortunately I upgraded to 0.3.1.7 before reading this (it didn't help), and can't figure out how to obtain 0.2.9.12 from the repos. I've tried these repos:
deb http://deb.torproject.org/torproject.org trusty main deb http://deb.torproject.org/torproject.org jessie main deb http://deb.torproject.org/torproject.org stretch main
All of them seem to only offer 0.3.1.7, but I'm not sure I'm looking in the right places or querying apt in the right way.
A static link to a signed dpkg (for 0.2.9.12) would be fine for the moment, if anyone knows of one.
There's 0.2.9 nightly, but I don't know if we have an 0.2.9-release build.
T
I doubt this bug is the cause if it's just happened recently. It's more likely that your relay is the HSDir for some popular onion service. Or a genuine DDoS.
I still don't know what's going on. I'd say any of these are possibilities:
- The bug that results in fallback directories getting extra DirPort traffic (23862) - DDoS - HSDir activity (the node in question did have the HSDir flag before being knocked offline) - A connection limit upstream
Can't your provider support that many connections?
I'm not sure. I know I haven't had this problem until recently. This node has been around on the same provider at the same IP for 700+ days. So I'm leaning away from this being a provider-level issue, but I don't really have the data to back that up. Any suggestions as to how I could make this determination?
More broadly, any tips for troubleshooting this beyond looking in the Tor logs and syslog would be appreciated.
There's 0.2.9 nightly, but I don't know if we have an 0.2.9-release build.
Yeah, unfortunately I could not find a 0.2.9.12 dpkg. I did find a deb, but it wouldn't install due to mismatched dependencies. I was able to get 0.2.9.12 installed from source, and briefly had it running, but it didn't have a service wrapper, and generally doesn't jive with my normal practices and Ansible scripts and such. So I bailed on that and am running the latest 0.3.1.7 package from the torproject repo again.
It's a shame there's no Ubuntu package for the long-term support Tor release. Thanks Teor for filing a ticket, but it doesn't look like it will be acted upon. It seems like a problem to me. I guess it's a matter of what is considered "stable"? Seems like the package maintainer thinks 0.3.1.7 is stable, while you (Teor) think it's not? I'm stuck with it for now.
I've scaled both of my fallback nodes up a bit (more CPU and RAM), and things seem stable at the moment. We'll see what happens. I'll take note of when the HSDir flag comes back.
Thanks for everyone's help.
On 19 Oct 2017, at 08:17, tor tor@anondroid.com wrote:
It's a shame there's no Ubuntu package for the long-term support Tor release. Thanks Teor for filing a ticket, but it doesn't look like it will be acted upon. It seems like a problem to me. I guess it's a matter of what is considered "stable"? Seems like the package maintainer thinks 0.3.1.7 is stable, while you (Teor) think it's not? I'm stuck with it for now.
0.3.1.7 is stable. But it's not Long Term Support. And it's understandable that they want to reduce their effort.
I'm not sure that using 0.2.9 will help you, anyway.
I've scaled both of my fallback nodes up a bit (more CPU and RAM), and things seem stable at the moment. We'll see what happens. I'll take note of when the HSDir flag comes back.
HSDir load spikes typically only last a day.
T
0.3.1.7 is stable
Earlier in the thread you said "I wouldn't recommend upgrading to 0.3.0 or later, there are stability issues on some clients, and maybe relays." That's what I was referring to. I think we're having a semantic argument; "stable" release doesn't always mean the software is actually stable (unfortunately).
But it's not Long Term Support.
Yeah, that's really the issue here. There's no way to install the LTS package on Ubuntu.
And it's understandable that they want to reduce their effort.
Agreed.
I'm not sure that using 0.2.9 will help you, anyway.
I'm not sure either. :) I'm getting some mixed messages. 0.2.9.11 has the bug https://trac.torproject.org/projects/tor/ticket/20059 from my top post, which was fixed in 0.2.9.12. That package would at least remove the bug from the equation. You had also mentioned the 0.3.x stability issues. Given this info, 0.2.9.12 seemed like the ideal version. However if the underlying cause is something else (like a HSDir spike, DDoS, or connection limit), you're right, the version probably doesn't matter much.
Thanks the info. Unfortunately I upgraded to 0.3.1.7 before reading this (it didn't help), and can't figure out how to obtain 0.2.9.12 from the repos. I've tried these repos:
deb http://deb.torproject.org/torproject.org trusty main deb http://deb.torproject.org/torproject.org jessie main deb http://deb.torproject.org/torproject.org stretch main
You get tor LTS (0.2.9.x) from the official Debian (non torproject.org) repos:
https://packages.debian.org/stretch/tor
tor-relays@lists.torproject.org