xplato xplato@protonmail.com wrote:
I am running two relays and the error message is the same for both:
Mar 30 08:13:01 freebsd kernel: pod 1745 (tor) , jid 0 , uid 256, was killed: out of swap space
Oh. Then Fabian may be right. Assuming that you already have what should be an adequate amount of swap space available, then this could be due to one of the cluster of memory management bugs introduced into the FreeBSD kernel in 11.2-RELEASE and remains in 12.x and very likely will be in the upcoming 13.0.
If I run # dd if=/dev/zero of=/usr/swap0 bs=1m count=512
Note that, while once upon a time 512 MB was a large amount of swap space, in modern times it is almost trivial and inconsequential.
#chmod 0600 /usr/swap0
#swapon -aL
Will that fix the error above?
It might alleviate it for a short time, but if the problem is due to those bugs, it likely will make little or no difference. The reason for that is that the message is partly erroneous; i.e., it is correct that the OOM killer has killed the process, but it is incorrect that it was out of swap space. Having watched those bugs in action for several years now, what I can tell you is that a lot of pagefixing is going on, but very little pagefreeing happens later. Processes being killed with that error message is just one symptom, and it can be a real problem, for example, if xorg is running and gets killed, leaving the console inaccessible. Another symptom is that, one by one, processes stop doing anything because they get swapped out due to the shortage of page frames on the free list. The kernel will not begin to page processes back in unless there is at least ~410 MB on the free list, so the system ends up with nothing running, not even shells, because everything is marked as swapped out. If what is happening to tor on your system, then increasing swap space likely will have no effect because swap space is not really where the shortage exists. The shortage is on the free list. There are some things that you can do in 11.4 that will minimize the situations where the memory management problems take over the system. You can set a sysctl tunable that may help a bit. Unfortunately, vm.max_wired no longer does anything and is a red herring. You can try to limit kernel memory by setting vm.kmem_size_max to some value considerably less than the size of real memory on your system. Although the system does not honor this limit either, it may still have a minor influence on how much the kernel uses. I think I set mine to 4 GB on an 8 GB machine. This should be set in /boot/loader.conf. In /etc/sysctl.conf there are several variables that should each help a little more. If you use ZFS, you can try limiting the size of the ARC by setting vfs.zfs.arc_max. After setting that, you may see the ARC grow to as much as ~200 MB more than the size you set the limit to, but it doesn't really go beyond that, so it does work after a fashion. Just allow for that extra couple of hundred megabytes or so. Next is vm.v_free_min, which on my system defaults to 65536, and I have increased that to 98304. Then there is this very important one: vm.pageout_wakeup_thresh=112640. Its default value is only 14124, a far cry from the ~410 MB needed on the free list for the kernel to begin paging a swapped process back into memory. (112640 pages are 440 MB, so it gives a tiny bit of leeway to the pagedaemon to get to work before the free list gets too low.) Lastly, set vm.pageout_oom_seq=102400000 to prevent the OOM killer from killing your processes. This value is the number of complete passes through memory the pagedaemon must make in its attempt to free enough memory to satisfy the current demand for free page frames before it calls the OOM killer. Setting the value that high means that the pagedaemon never will get through that may passes, so the OOM killer never gets called. After setting this one you may occasionally see the pagedaemon using all of one core's CPU time for a while, possibly a *long* while, but it should protect your processes from being killed due to the collection of memory management bugs. With all of the variables mentioned above set to better values you may still see the system slowly grind down to an idle state. This can happen due to the kernel prioritizing the keeping of unused file system buffers in memory over swapping processes in to get actual work done. In such a case, manual intervention is required to free up page frames. For example, if you confine your ccache directory trees to a UFS file system, the system will quickly accumulate a lot of buffers it doesn't want to let go of. The same holds true for portmaster's $WRKDIRPREFIX, where a "portmaster -a" to update your ports will tie up a large number of buffers. buildworld and buildkernel are also culprits. The file system buffers can be forcibly freed up, thereby freeing page frames occupied by the file system buffers, by unmounting the UFS file system. (It can be remounted after waiting a few seconds to make sure that the free list has been updated, which you can watch for with top(1).) Another trick can be used if you have a longrunning process that uses at least several hundred megabytes of memory. If you can temporarily shut such a process down or otherwise get it to free up its memory, the system will begin paging in swapped processes, after which you can restart the one you halted. Often this action shakes up memory management enough that things in your system will go on recovering or at least begin doing some work again. For example, I run mprime with four worker threads, two of which typically are using 300 MB to 400 MB each. I can temporarily stop those two workers to free up plenty of page frames to get swapped stuff brought back in. Then I can restart those worker threads, and usually things will gradually return to normal. However, this is a special case that is not available if you don't have such a process to manipulate in this manner.
Scott