Scott Bennett bennett@sdf.org wrote on 2021-03-30:
Fabian Keil freebsd-listen@fabiankeil.de wrote:
xplato xplato@protonmail.com wrote on 2021-03-30:
I am a bit of a noob here so please bear with me. I ran a relay using Ubuntu with very few issues however I decide to add an additional relay and decided to use FreeBSD. They will only run for around 18 hours and then they shut down. I have adjust the torrc file every way I know how and increased the Max vnodes thinking this may have been my issue. I can post the sysrc and torrc if needed. Anyone that might help me figure this out I would be grateful otherwise I am going to reluctantly move them both back to Ubuntu.
What do you mean by "they" and "shutting down"?
It exits on signal 6. See the tail end of the log excerpt I
included in my posting about 0.4.6.1-alpha being very noisy. N.B. 0.4.5.7 also has the same bug.
It's not obvious to me that you are experiencing the same issue.
I have several non-exit relays running 0.4.5.7 on ElectroBSD systems based on FreeBSD 11.4-STABLE and they appear to be stable.
One example:
Mar 30 08:50:48.691 [notice] {HEARTBEAT} Heartbeat: Tor's uptime is 13 days 0:00 hours, with 14270 circuits open. I've sent 11283.88 GB and received 11200.63 GB. I've received 3494368 connections on IPv4 and 0 on IPv6. I've made 653389 connections with IPv4 and 0 with IPv6.
While I'm also seeing BUG messages they are different from yours and are probably related to the MaxMemInQueues settings:
Mar 30 07:10:01.071 [notice] {CONTROL} New control connection opened from 127.0.1.5. Mar 30 07:10:09.457 [notice] {GENERAL} We're low on memory (cell queues total alloc: 952512 buffer total alloc: 2557952, tor compress total alloc: 292862049 (zlib: 0, zstd: 0, lzma: 292862001), rendezvous cache total alloc: 23736810). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.) Mar 30 07:10:09.531 [notice] {GENERAL} Removed 1147072 bytes by killing 17401 circuits; 0 circuits remain alive. Also killed 2 non-linked directory connections. Mar 30 07:10:09.532 [warn] {BUG} channel_flush_from_first_active_circuit: Bug: Found a supposedly active circuit with no cells to [...] Mar 30 07:10:09.532 [warn] {BUG} channel_flush_from_first_active_circuit: Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.5.7 )
I intentionally reduced the MaxMemInQueues value before the upgrade as the server became unresponsive due to a lack of mbuf clusters.
I wrote about it here: https://www.fabiankeil.de/blog-surrogat/2021/03/14/website-ausfall-durch-mbu... (The text is in German but the Munin graphs and the log messages aren't).
I'm using Tor 0.4.6.1 alpha as a client and on a couple of servers to provide onion services and haven't seen any crashes there either.
Please also check the system and tor logs to see if there are any relevant log messages.
I posted such in my previous two postings.
Again, I think it's premature to conclude that you are experiencing the same issue.
If you don't already, you may want to start monitoring tor's and the system's resource usage.
xplato's relay may differ from mine, but mine isn't using much these
days because the authorities have a) stopped giving mine an HSDir flag for the last two months or so after giving it seemingly randomly for months before that, and b) been giving or withholding a Fast flag seemingly also at random. This is on a relay that often runs at 300 KB/s to 400+ KB/s, both in and out. The whole mess makes me wonder whether it's worth my bothering to maintain and run a relay anymore. Can you tell us why the consensus documents on Sunday showed *all* authorities with Bandwidth=20 Unmeasured=1 for some hours? What caused such an alarming situation?
I haven't looked at the consensus documents from Sunday and don't run bandwidth scanners so I have no information on this.
Fabian