[tor-relays] Help with FreeBSD relays

Tue Mar 30 18:14:37 UTC 2021

xplato <xplato at protonmail.com> wrote:

> I am running two relays and the error message is the same for both:
>
> Mar 30 08:13:01 freebsd kernel: pod 1745 (tor) , jid 0 , uid 256, was killed: out of swap space
>
     Oh.  Then Fabian may be right.  Assuming that you already have what
should be an adequate amount of swap space available, then this could be
due to one of the cluster of memory management bugs introduced into the
FreeBSD kernel in 11.2-RELEASE and remains in 12.x and very likely will be
in the upcoming 13.0.

> If I run
> # dd if=/dev/zero of=/usr/swap0 bs=1m count=512
>
     Note that, while once upon a time 512 MB was a large amount of swap
space, in modern times it is almost trivial and inconsequential.

> #chmod 0600 /usr/swap0
>
> #swapon -aL
>
> Will that fix the error above?

     It might alleviate it for a short time, but if the problem is due to
those bugs, it likely will make little or no difference.  The reason for
that is that the message is partly erroneous; i.e., it is correct that the
OOM killer has killed the process, but it is incorrect that it was out of
swap space.  Having watched those bugs in action for several years now,
what I can tell you is that a lot of pagefixing is going on, but very little
pagefreeing happens later.  Processes being killed with that error message
is just one symptom, and it can be a real problem, for example, if xorg is
running and gets killed, leaving the console inaccessible.  Another symptom
is that, one by one, processes stop doing anything because they get swapped
out due to the shortage of page frames on the free list.  The kernel will
not begin to page processes back in unless there is at least ~410 MB on the
free list, so the system ends up with nothing running, not even shells,
because everything is marked as swapped out.  If what is happening to tor
on your system, then increasing swap space likely will have no effect because
swap space is not really where the shortage exists.  The shortage is on the
free list.
     There are some things that you can do in 11.4 that will minimize the
situations where the memory management problems take over the system.  You
can set a sysctl tunable that may help a bit.  Unfortunately, vm.max_wired
no longer does anything and is a red herring.  You can try to limit kernel
memory by setting vm.kmem_size_max to some value considerably less than the
size of real memory on your system.  Although the system does not honor this
limit either, it may still have a minor influence on how much the kernel
uses.  I think I set mine to 4 GB on an 8 GB machine.  This should be set in
/boot/loader.conf.
     In /etc/sysctl.conf there are several variables that should each help
a little more.  If you use ZFS, you can try limiting the size of the ARC by
setting vfs.zfs.arc_max.  After setting that, you may see the ARC grow to
as much as ~200 MB more than the size you set the limit to, but it doesn't
really go beyond that, so it does work after a fashion.  Just allow for that
extra couple of hundred megabytes or so.  Next is vm.v_free_min, which on my
system defaults to 65536, and I have increased that to 98304.  Then there is
this very important one:  vm.pageout_wakeup_thresh=112640.  Its default
value is only 14124, a far cry from the ~410 MB needed on the free list for
the kernel to begin paging a swapped process back into memory.  (112640 pages
are 440 MB, so it gives a tiny bit of leeway to the pagedaemon to get to work
before the free list gets too low.)  Lastly, set vm.pageout_oom_seq=102400000
to prevent the OOM killer from killing your processes.  This value is the
number of complete passes through memory the pagedaemon must make in its
attempt to free enough memory to satisfy the current demand for free page
frames before it calls the OOM killer.  Setting the value that high means
that the pagedaemon never will get through that may passes, so the OOM killer
never gets called.  After setting this one you may occasionally see the
pagedaemon using all of one core's CPU time for a while, possibly a *long*
while, but it should protect your processes from being killed due to the
collection of memory management bugs.
     With all of the variables mentioned above set to better values you may
still see the system slowly grind down to an idle state.  This can happen
due to the kernel prioritizing the keeping of unused file system buffers in
memory over swapping processes in to get actual work done.  In such a case,
manual intervention is required to free up page frames.  For example, if you
confine your ccache directory trees to a UFS file system, the system will
quickly accumulate a lot of buffers it doesn't want to let go of.  The same
holds true for portmaster's $WRKDIRPREFIX, where a "portmaster -a" to update
your ports will tie up a large number of buffers.  buildworld and buildkernel
are also culprits.  The file system buffers can be forcibly freed up, thereby
freeing page frames occupied by the file system buffers, by unmounting the
UFS file system.  (It can be remounted after waiting a few seconds to make
sure that the free list has been updated, which you can watch for with top(1).)
     Another trick can be used if you have a longrunning process that uses at
least several hundred megabytes of memory.  If you can temporarily shut such
a process down or otherwise get it to free up its memory, the system will
begin paging in swapped processes, after which you can restart the one you
halted.  Often this action shakes up memory management enough that things in
your system will go on recovering or at least begin doing some work again.
For example, I run mprime with four worker threads, two of which typically are
using 300 MB to 400 MB each.  I can temporarily stop those two workers to free
up plenty of page frames to get swapped stuff brought back in.  Then I can
restart those worker threads, and usually things will gradually return to
normal.  However, this is a special case that is not available if you don't
have such a process to manipulate in this manner.

					Scott