Re: [tor-relays] published descriptor missing from consensus

4 Jun 2017


      Roger Dingledine <arma@mit.edu> wrote:
...
On Sun, Jun 04, 2017 at 12:30:20AM -0500, Scott Bennett wrote:
...
Late Wednesday afternoon, I restarted my relay (MYCROFTsOtherChild),
which changed it from 0.3.0.6 to 0.3.0.7.  That was the only change I made.
It went through a normal startup and published its descriptor.  After a few
hours, tor noticed that its descriptor was still not in the latest consensus
Interesting mystery! You always have the most exciting mysteries. :)
A lifelong problem, I'm afraid.  I started bumping into compiler and
library bugs at 16 or 17 that nobody had reported.  It was all unintentional
on my part.
     BTW, some time back, I noticed that the uptime instability in heartbeat
messages returned, or perhaps it's a new one.  It reports the first 3600-second
heartbeat period at 0:59 hours, and each one thereafter is at n:59 hours.  If
the system is under some particularly heavy load, I've seen the time lengthen
to where the messages show up as late as n:04 hours.
...
I just instrumented moria1 to be more detailed on why it doesn't find
each relay reachable, and here's what I found:
Jun 04 18:12:44.147 [info] dirserv_single_reachability_test(): Testing reachability of MYCROFTsOtherChild at 73.246.41.113:32323.
Jun 04 18:13:47.147 [info] connection_handle_write_impl(): in-progress connect to 73.246.41.113:32323 failed. Removing. (Connection timed out)
Jun 04 18:34:04.205 [info] dirserv_single_reachability_test(): Testing reachability of MYCROFTsOtherChild at 73.246.41.113:32323.
Jun 04 18:35:07.205 [info] connection_handle_write_impl(): in-progress connect to 73.246.41.113:32323 failed. Removing. (Connection timed out)
So it would appear that it's trying to make a TCP connection, and
after 63 seconds, it decides it's not going to work.
It would seem that 6 of the 8 directory authorities are not voting the
Running flag, so I guess they are seeing something similar (or would be
if they hacked their logs up to display it).
Which versions are the Running votes coming from versus the non-Running?
...
This is weird, because when I telnet to your IP:port, it connects
easily. And when I set your IP:port as my bridge address, my Tor client
bootstraps fine.
So I am left wondering if there's something different about how Tor
requests that the system launch a TCP connection, or if Comcast or
your system is somehow filtering (or not being able to handle) certain
connection attempts.
I have a few commands in a crontab entry that extract relay IP addresses
from the most recently received consensus, sort them, and load them into a
table in pf.  They run every 15 minutes.  Anything coming from the addresses
in the table is immediately passed.  Anything not passed gets checked against
a much larger table of probers, attempted intruders, etc. and is blocked if it
matches a table entry.  Anything left over gets passed.  This setup has been
in place for many years without problems.  It does leave open the possibility
that a relay that has started recently and is not yet listed in the most
recently downloaded consensus could get failures until it does show up, but
that would be very temporary.  Also, authorities should basically always be in
the relays list that is checked first.
     Again, 0.3.0.6 was working fine up until I shut it down.  The restart was
then as 0.3.0.7 and has not "worked" yet, although I'm still using the client
functions without problems.  Also, my relay had no problem connecting to itself
during its reachability and "bandwidth" testing.  Have significant numbers of
other relays vanished from the consensus after changing to 0.3.0.7?
     Roger, thanks very much for taking a look at this problem.  I've been
running this relay for almost ten years, and I would like to continue to do
so, even though it doesn't normally get a lot of traffic anymore.  If there's
anything you would like me to try, please let me know.
     Hmmm...a dim memory has blossomed while I've been typing here.  Some
years ago there was a problem with a version of openssl that couldn't talk to
itself.  That time I could see lots of connection attempts with no connections
becoming established, however.  In that situation, tor was unable to connect
to itself during reachability testing, so it never published a descriptor and
continued to try to connect until I shut it down.
     FWIW, the latest heartbeat messages were:

Jun 04 18:35:21.761 [notice] Heartbeat: It seems like we are not in the cached consensus.
Jun 04 18:35:21.762 [notice] Heartbeat: Tor's uptime is 3 days 4:59 hours, with 2 circuits open. I've sent 19.71 MB and received 181.87 MB.
Jun 04 18:35:21.762 [notice] Average packaged cell fullness: 48.015%. TLS write overhead: 11%
Jun 04 18:35:21.762 [notice] Circuit handshake stats since last time: 0/0 TAP, 0/0 NTor.
Jun 04 18:35:21.762 [notice] Since startup, we have initiated 0 v1 connections, 0 v2 connections, 0 v3 connections, and 140 v4 connections; and received 4 v1 connections, 0 v2 connections, 2 v3 connections, and 1168 v4 connections.


                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************