Roger Dingledine arma@mit.edu wrote:
On Sun, Jun 04, 2017 at 12:30:20AM -0500, Scott Bennett wrote:
Late Wednesday afternoon, I restarted my relay (MYCROFTsOtherChild),
which changed it from 0.3.0.6 to 0.3.0.7. That was the only change I made. It went through a normal startup and published its descriptor. After a few hours, tor noticed that its descriptor was still not in the latest consensus
Interesting mystery! You always have the most exciting mysteries. :)
A lifelong problem, I'm afraid. I started bumping into compiler and library bugs at 16 or 17 that nobody had reported. It was all unintentional on my part. BTW, some time back, I noticed that the uptime instability in heartbeat messages returned, or perhaps it's a new one. It reports the first 3600-second heartbeat period at 0:59 hours, and each one thereafter is at n:59 hours. If the system is under some particularly heavy load, I've seen the time lengthen to where the messages show up as late as n:04 hours.
I just instrumented moria1 to be more detailed on why it doesn't find each relay reachable, and here's what I found:
Jun 04 18:12:44.147 [info] dirserv_single_reachability_test(): Testing reachability of MYCROFTsOtherChild at 73.246.41.113:32323. Jun 04 18:13:47.147 [info] connection_handle_write_impl(): in-progress connect to 73.246.41.113:32323 failed. Removing. (Connection timed out)
Jun 04 18:34:04.205 [info] dirserv_single_reachability_test(): Testing reachability of MYCROFTsOtherChild at 73.246.41.113:32323. Jun 04 18:35:07.205 [info] connection_handle_write_impl(): in-progress connect to 73.246.41.113:32323 failed. Removing. (Connection timed out)
So it would appear that it's trying to make a TCP connection, and after 63 seconds, it decides it's not going to work.
It would seem that 6 of the 8 directory authorities are not voting the Running flag, so I guess they are seeing something similar (or would be if they hacked their logs up to display it).
Which versions are the Running votes coming from versus the non-Running?
This is weird, because when I telnet to your IP:port, it connects easily. And when I set your IP:port as my bridge address, my Tor client bootstraps fine.
So I am left wondering if there's something different about how Tor requests that the system launch a TCP connection, or if Comcast or your system is somehow filtering (or not being able to handle) certain connection attempts.
I have a few commands in a crontab entry that extract relay IP addresses from the most recently received consensus, sort them, and load them into a table in pf. They run every 15 minutes. Anything coming from the addresses in the table is immediately passed. Anything not passed gets checked against a much larger table of probers, attempted intruders, etc. and is blocked if it matches a table entry. Anything left over gets passed. This setup has been in place for many years without problems. It does leave open the possibility that a relay that has started recently and is not yet listed in the most recently downloaded consensus could get failures until it does show up, but that would be very temporary. Also, authorities should basically always be in the relays list that is checked first. Again, 0.3.0.6 was working fine up until I shut it down. The restart was then as 0.3.0.7 and has not "worked" yet, although I'm still using the client functions without problems. Also, my relay had no problem connecting to itself during its reachability and "bandwidth" testing. Have significant numbers of other relays vanished from the consensus after changing to 0.3.0.7? Roger, thanks very much for taking a look at this problem. I've been running this relay for almost ten years, and I would like to continue to do so, even though it doesn't normally get a lot of traffic anymore. If there's anything you would like me to try, please let me know. Hmmm...a dim memory has blossomed while I've been typing here. Some years ago there was a problem with a version of openssl that couldn't talk to itself. That time I could see lots of connection attempts with no connections becoming established, however. In that situation, tor was unable to connect to itself during reachability testing, so it never published a descriptor and continued to try to connect until I shut it down. FWIW, the latest heartbeat messages were:
Jun 04 18:35:21.761 [notice] Heartbeat: It seems like we are not in the cached consensus. Jun 04 18:35:21.762 [notice] Heartbeat: Tor's uptime is 3 days 4:59 hours, with 2 circuits open. I've sent 19.71 MB and received 181.87 MB. Jun 04 18:35:21.762 [notice] Average packaged cell fullness: 48.015%. TLS write overhead: 11% Jun 04 18:35:21.762 [notice] Circuit handshake stats since last time: 0/0 TAP, 0/0 NTor. Jun 04 18:35:21.762 [notice] Since startup, we have initiated 0 v1 connections, 0 v2 connections, 0 v3 connections, and 140 v4 connections; and received 4 v1 connections, 0 v2 connections, 2 v3 connections, and 1168 v4 connections.
Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************