Hi René,
Sorry that the upgrade to 0.2.8 has caused problems for you.
Thanks for analysing the issue, and for a very detailed bug report.
I have tried to explain why this happened below - there have been a lot of changes since 0.2.5, and what you're seeing is due to at least two of those changes, and at least one bug.
On 20 Sep 2016, at 00:09, René Mayrhofer rm@ins.jku.at wrote:
Update: After a hint by Peter Palfrader, I now set the Address option as well:
root@tor2 ~ # grep Address /etc/tor/torrc Address 193.171.202.146 OutboundBindAddress 193.171.202.150
This seems to work with 0.2.8.7-1, so we should be up and running with a recent version now. However, we did not set Address before, exactly because we have two addresses assigned for the Tor exit node on this host (as opposed to being behind a NAT gateway with port forwarding with different internal and externally visible addresses, which might be the more common case). Is a setup like ours supported with explicitly setting Address?
With setting Address, yes. Without setting Address, not yet, at least not reliably.
It it always hard for Tor to guess addresses. We've been trying to work out how to make it easier and more reliable. I think you've also identified a new bug where Tor is unintentionally re-ordering addresses.
On 19 Sep 2016, at 23:14, René Mayrhofer rm@ins.jku.at wrote:
Dear Tor developers,
[Please CC me in replies, I am not currently subscribed to tor-dev.]
Context: At the Institute of Networks and Security at Johannes Kepler University Linz, we have been hosting Austria's fastest exit node for the last ca. 9 months. It used to be listed as https://atlas.torproject.org/#details/01A9258A46E97FF8B2CAC7910577862C14F2C5... until very recently, and we tried to find out what went wrong when we saw traffic drop sharply a bit over a week ago. Unfortunately, two out of three people responsible for running this node were on holidays, so we could only start investigating today.
Setup: Please note that our setup is a bit particular for reasons that we will explain in more detail in a later message (including a proposed patch to the current source which has been pending also because of the holiday situation...). Briefly summarizing, we use a different network interface for "incoming" (Tor encrypted traffic) than for "outgoing" (mostly clearnet traffic from the exit node, but currently still includes outgoing Tor relay traffic to other nodes). The outgoing interface has the default route associated, while the incoming interface will only originate traffic in response to those incoming connections. Consequently, we let our Tor node only bind to the IP address assigned to the incoming interface 193.171.202.146, while it will initiate new outgoing connections with IP 193.171.202.150.
This isn't the default setup, but it's actually quite common, particularly for Exit relays that want to segregate their outbound traffic from their public relay address.
Problem: This worked nicely with Tor 0.2.5.12-1 on Debian Jessie. We upgraded about two weeks ago to 0.2.8.7-1 from the Tor apt repositories (mostly in response to https://blog.torproject.org/blog/tor-0287-released-important-fixes as a wakeup call that we were using old versions from Debian main).
Thanks for upgrading! We know that it takes effort, and time to re-establish relay flags.
At first, it seemed to work well enough, but then the holidays came and we didn't actively watch it for the next week.... Now with 0.2.8.7-1, the traffic sent to our node started declining until it vanished completely. After a bit of debugging and rolling back to 0.2.5.12-1 (which is now active on our node as of a few hours ago, slowly approaching the 200MBit/s again), it seems that we discovered a regression concerning the handling of sockets. I can best summarize it with the relevant torrc config options and startup log lines from both versions:
root@tor2 ~ # grep 193.171.202 /etc/tor/torrc ORPort 193.171.202.146:9001 ORPort 193.171.202.146:443 OutboundBindAddress 193.171.202.150 DirPort 193.171.202.146:9030
Since you don't set your IPv4 address using Address, this means that Tor tries to guess your address. On a machine with multiple IPv4 addresses, this means it might not guess the address you expect.
I think that 0.2.5 only looked at the first interface the OS returned, and that happened to be the one you wanted. But guessing using interface addresses is never going to be reliable on multi-IPv4 machines.
Between 0.2.5 and 0.2.8, the address guessing code was modified several times. It now looks at all your local network interfaces to guess the address. I think there was an unintentional ordering change. We can fix that (see below).
In 0.2.9, you will also get a warning when your ORPort bind address and guessed Address don't match: https://trac.torproject.org/projects/tor/ticket/13953
But I also think we should warn when Tor guesses between multiple addresses, because some operators are going to find that Tor guesses one they don't want: https://trac.torproject.org/projects/tor/ticket/20164
We also have a ticket open to change Tor to do exactly what most relay operators expect, which is use the ORPort IPv4 address to guess address, before using more unreliable methods like interfaces. (This is also what we do to find the IPv6 address in the descriptor - use the first IPv6 ORPort address.) But no-one has written the code for it yet: https://trac.torproject.org/projects/tor/ticket/19919
Sep 19 11:37:41.000 [notice] Tor 0.2.8.7 (git-cc2f02ef17899f86) opening log file. Sep 19 11:37:41.194 [notice] Tor v0.2.8.7 (git-cc2f02ef17899f86) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.1t and Zlib 1.2.8. ... Sep 19 11:37:41.198 [warn] Tor is running as an exit relay. If you did not want this behavior, please set the ExitRelay option to 0. If you do want to run an exit Relay, please set the ExitRelay option to 1 to disable this warning, and for forward compatibility. Sep 19 11:37:41.198 [warn] You specified a public address '0.0.0.0:9050' for SocksPort. Other people on the Internet might find your computer and use it as an open proxy. Please don't allow this unless you have a good reason. Sep 19 11:37:41.199 [notice] Opening Socks listener on 0.0.0.0:9050
I hope you have a SOCKSPolicy in place (or the equivalent firewall rules), otherwise anyone can use your relay as an unencrypted, open proxy.
Sep 19 11:37:41.199 [notice] Opening Control listener on 127.0.0.1:9051 Sep 19 11:37:41.199 [notice] Opening OR listener on 193.171.202.146:9001 Sep 19 11:37:41.199 [notice] Opening OR listener on 193.171.202.146:443 Sep 19 11:37:41.199 [notice] Opening Directory listener on 193.171.202.146:9030 ... Sep 19 11:37:51.000 [notice] Now checking whether ORPort 193.171.202.150:9001 and DirPort 193.171.202.150:9030 are reachable... (this may take up to 20 minutes -- look for log messages indicating success) Sep 19 11:38:30.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent.
This is an interesting edge-case: Tor doesn't (and likely can't) check that the ORPort a client thinks it is connecting to, is the same as the one it just advertised. So the ORPort reachability check succeeded on your relay, because some clients still had the old address in the old descriptor. And Tor never repeats the check.
https://trac.torproject.org/projects/tor/ticket/20165
... Sep 19 11:57:50.000 [warn] Your server (193.171.202.150:9030) has not managed to confirm that its DirPort is reachable. Relays do not publish descriptors until their ORPort and DirPort are reachable. Please check your firewalls, ports, address, /etc/hosts file, etc.
But the DirPort check fails, because it uses the address in the descriptor. And since 0.2.8.1-alpha, the DirPort needs to be reachable for relays to publish their descriptor:
o Minor bugfixes (relays): - Check that both the ORPort and DirPort (if present) are reachable before publishing a relay descriptor. Otherwise, relays publish a descriptor with DirPort 0 when the DirPort reachability test takes longer than the ORPort reachability test. Fixes bug 18050; bugfix on 0.1.0.1-rc. Reported by "starlight", patch by "teor".
... Sep 19 12:01:48.000 [notice] Tor 0.2.5.12 (git-3731dd5c3071dcba) opening log file. ... Sep 19 12:01:52.000 [notice] Now checking whether ORPort 193.171.202.146:9001 and DirPort 193.171.202.146:9030 are reachable... (this may take up to 20 minutes -- look for log messages indicating success) Sep 19 12:01:53.000 [notice] Self-testing indicates your DirPort is reachable from the outside. Excellent. Sep 19 12:01:53.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor. ...
Please note the difference (0.2.8.7): Sep 19 11:37:51.000 [notice] Now checking whether ORPort 193.171.202.150:9001 and DirPort 193.171.202.150:9030 are reachable... (this may take up to 20 minutes -- look for log messages indicating success) vs. (0.2.5.12): Sep 19 12:01:52.000 [notice] Now checking whether ORPort 193.171.202.146:9001 and DirPort 193.171.202.146:9030 are reachable... (this may take up to 20 minutes -- look for log messages indicating success)
I.e. 0.2.8.7 does not seem to honor the address the socket is bound to when starting the reachability checks from the outside
Tor never used the address the socket was bound to. It just guesses the one you want in 0.2.5, and a different one in 0.2.8.
Tor can't actually use the bind address for reachability checks, because there's only one IPv4 address in a relay descriptor, and that's the one that other tor instances will try to connect to. Also, what if the ORPort and DirPort are on different addresses? What if the relay is behind a NAT? There's a discussion of these kinds of issues here: https://trac.torproject.org/projects/tor/ticket/17782
(it seems to use the address that either the default route is associated with or the OutboundBindAddress) - although the socket binding itself is done correctly (i.e. the netstat output is exactly the same for both versions, with tor binding to the specific IP address only for the Dir and both OR ports). Consequently, the node is declared as non-reachable and drops off the globe/atlas...
Has this change been intentional? I have to admit we have not yet checked the source code for further debugging, as we wanted to get the node back up as quickly as possible (after our unfortunately timed holidays, sorry for that).
No, I don't think the change was intentional. It could have been any of the changes below that caused this issue, but I would guess it's probably an unintentional result of commit 31eb486 in 17027, which inadvertently reorders the address list by using SMARTLIST_DEL_CURRENT() rather than SMARTLIST_DEL_CURRENT_KEEPORDER().
I've logged a ticket so we can fix this: https://trac.torproject.org/projects/tor/ticket/20163
In 0.2.8.1-alpha:
o Minor features (relay, address discovery): - Add a family argument to get_interface_addresses_raw() and subfunctions to make network interface address interogation more efficient. Now Tor can specifically ask for IPv4, IPv6 or both types of interfaces from the operating system. Resolves ticket 17950. - When get_interface_address6_list(.,AF_UNSPEC,.) is called and fails to enumerate interface addresses using the platform-specific API, have it rely on the UDP socket fallback technique to try and find out what IP addresses (both IPv4 and IPv6) our machine has. Resolves ticket 17951.
And 0.2.7.1-alpha:
o Minor bugfixes (security, exit policies): - ExitPolicyRejectPrivate now also rejects the relay's published IPv6 address (if any), and any publicly routable IPv4 or IPv6 addresses on any local interfaces. ticket 17027. Patch by "teor". Fixes bug 17027; bugfix on 0.2.0.11-alpha.
o Minor bugfixes (network): - When attempting to use fallback technique for network interface lookup, disregard loopback and multicast addresses since they are unsuitable for public communications.
o Code simplification and refactoring: - Move the hacky fallback code out of get_interface_address6() into separate function and get it covered with unit-tests. Resolves ticket 14710.
And 0.2.6.3-alpha:
o Minor bugfixes (portability): - Fix the ioctl()-based network interface lookup code so that it will work on systems that have variable-length struct ifreq, for example Mac OS X.
- Refactor the get_interface_addresses_raw() doom-function into multiple smaller and simpler subfunctions. Cover the resulting subfunctions with unit-tests. Fixes a significant portion of issue 12376.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n xmpp: teor at torproject dot org