Tor guard relay memory: what actually reduces RSS on Ubuntu 24.04
Hello, We ran experiments over 10 days on ~100 Tor guard relays (Tor 0.4.8.x) on Ubuntu 24.04 to determine which memory optimizations materially affect long-term, steady-state RSS under sustained load. Results: allocator choice dominates steady-state RSS outcomes. Using mimalloc 2.x and jemalloc 5.x produced large, sustained reductions in resident memory, on the order of 70–80 percent, compared to the default glibc malloc; tcmalloc showed a smaller but measurable improvement. By contrast, the two commonly cited configuration parameters, MaxMemInQueues and MaxConsensusAgeForDiffs, showed little to no impact on steady-state RSS when adjusted in isolation. Operationally, these reductions allow higher relay density per host, with guard relay memory footprints decreasing from roughly 5–6 GB to approximately 1–2 GB per relay under sustained load. Attached are the summary chart and table. Full methodology, comparisons, and data are here, with links to scripts on GitHub to reproduce: https://www.1aeo.com/blog/tor-memory-optizations-what-actually-works.html Tor at 1AEO Approach Result Viable? mimalloc 2.1.2 1.16 GB (80% reduction) Yes jemalloc 5.3.0 1.59 GB (72% reduction) Yes tcmalloc 2.15 3.79 GB (33% reduction) Partial mimalloc 3.0.1 4.39 GB (23% reduction) Partial DirCache 0 0.29 GB (94% reduction) No (loses Guard) MaxMemInQueues ~5.0 GB (no change) No MaxConsensusAgeForDiffs ~5.7 GB (no change) No Periodic restarts 5.0–5.3 GB (minimal) Workaround only glibc 2.39 (control) 5.68 GB Baseline
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hello. I'm surprised that the memory savings are so great. Note that Debian Trixie does not have mimalloc2, only mimalloc3. Did you find out what caused mimalloc3 to lose so many of the savings that you got with mimalloc2? Perhaps there are some runtime configuration tweaks that could be used to adjust its behavior, if the "regression" is purely a matter of different defaults being used. For jemalloc, have you considered giving mozjemalloc a try? It has a number of security improvements that jemalloc lacks. I'm also curious if there are any glibc tunables that can be used to improve its memory. See "/lib64/ld-linux-x86-64.so.2 --list-tunables" for a list of supported tunables. Perhaps setting the correct choice of MALLOC_ARENA_MAX or MALLOC_MMAP_THRESHOLD would help? With that said, I'm not a fan of the use of LLMs in that write-up. It makes me constantly question the accuracy of the claims that are made (although I can and will test them myself). I'd much rather a detailed write-up in English with non-native fluency than fluent AI slop. Regards, forest -----BEGIN PGP SIGNATURE----- iHUEARYKAB0WIQQtr8ZXhq/o01Qf/pow+TRLM+X4xgUCaWBOlgAKCRAw+TRLM+X4 xmq0AQCFG+8aspitBFzKpvgSTTZzaGV4OYehfFAAXU3tqkFtvgD/SFn49LYdyIbq JF1nFLLEmBmEBmALPArRsHLB035o7gI= =giFe -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hello. Unfortunately, mimalloc2 appears to be useless on stock Alpine Linux, causing repeated segmentation faults that crash Tor after a short time: [74497.456152] tor[2942]: segfault at 2ba16010000 ip 00007f42d17d7da0 sp 00007f42cbf1b558 error 6 in libmimalloc-secure.so.2.2[15da0,7f42d17c6000+1f000] likely on CPU 0 (core 0, socket 0) [74497.456168] Code: 49 f7 d9 48 31 f8 48 d3 c0 48 01 c8 49 89 01 48 8d 04 32 48 89 c2 48 8b 4f 38 48 33 47 40 49 89 f1 49 f7 d9 48 d3 c0 48 01 c8 <4a> 89 94 0a 49 39 d0 73 dc 48 8b 47 10 48 8b 4f 38 48 85 c0 48 0f As this is occurring with mimalloc2 while mimalloc3 is the latest, and considering mimalloc3 does not show such heavy memory savings, I'm not going to be putting any effort into debugging this. I'm trying out jemalloc now. If this can help reduce some of the high memory pressure on this 512 MiB VPS, that would be very nice. Regards, forest -----BEGIN PGP SIGNATURE----- iHUEARYKAB0WIQQtr8ZXhq/o01Qf/pow+TRLM+X4xgUCaWGCMgAKCRAw+TRLM+X4 xpJAAPoCnyfkusJbgF9MzZt6G8htugIGObCmaWvSjY8lVQ29bwD/dWPtjLUK1eXC X+7LY9tnJ9TO+cBX52ie1MICBjbNOQE= =dmEh -----END PGP SIGNATURE-----
Unfortunate about the mimalloc 2.x segfaults on Alpine Linux with Tor. Looking forward to your results with jemalloc, which also worked well for us. We haven’t researched root causes, allocator parameters, build-time changes, or mozjemalloc yet. It would be great to see more relay operators sharing more experiments and data points. In addition to the Ubuntu 24.04 experiment, we also ran a 10-day test on Debian 13.2. The outcome was the same: mimalloc 2.0.9 and 2.1.7 both performed very well. We compiled both locally, which was quick and straightforward. Interestingly, mimalloc 3.0.1 performed worse on Debian 13.2 (where it ships by default), with continued memory growth compared to Ubuntu 24.04. We posted a brief follow-up blog with a chart here: https://www.1aeo.com/blog/mimalloc-209-tor-relay-deployment.html We’ve started migrating our guard relays to mimalloc 2.1.2 (Ubuntu 24.04 default) or 2.0.9 (compiled on Debian 13.2). The memory savings have been substantial: roughly 200 relays now consume ~300 GB total instead of ~1000 GB, which is a significant operational cost difference. On Friday, January 9th, 2026 at 2:34 PM, forest-relay-contact--- via tor-relays <tor-relays@lists.torproject.org> wrote:
Hello.
Unfortunately, mimalloc2 appears to be useless on stock Alpine Linux, causing repeated segmentation faults that crash Tor after a short time:
[74497.456152] tor[2942]: segfault at 2ba16010000 ip 00007f42d17d7da0 sp 00007f42cbf1b558 error 6 in libmimalloc-secure.so.2.2[15da0,7f42d17c6000+1f000] likely on CPU 0 (core 0, socket 0) [74497.456168] Code: 49 f7 d9 48 31 f8 48 d3 c0 48 01 c8 49 89 01 48 8d 04 32 48 89 c2 48 8b 4f 38 48 33 47 40 49 89 f1 49 f7 d9 48 d3 c0 48 01 c8 <4a> 89 94 0a 49 39 d0 73 dc 48 8b 47 10 48 8b 4f 38 48 85 c0 48 0f
As this is occurring with mimalloc2 while mimalloc3 is the latest, and considering mimalloc3 does not show such heavy memory savings, I'm not going to be putting any effort into debugging this.
I'm trying out jemalloc now. If this can help reduce some of the high memory pressure on this 512 MiB VPS, that would be very nice.
Regards, forest
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hello. Tor at 1AEO wrote:
Unfortunate about the mimalloc 2.x segfaults on Alpine Linux with Tor. Looking forward to your results with jemalloc, which also worked well for us.
I'm hesitant to switch to jemalloc on too many systems due to the poor security track record it has, although a random segfault with mimalloc does not give me the warm fuzzies, either. That is why I suggested to give mozjemalloc a try. It should be a drop-in replacement.
We haven’t researched root causes, allocator parameters, build-time changes, or mozjemalloc yet. It would be great to see more relay operators sharing more experiments and data points.
There are a number of other memory allocators out there that you may want to test along with various tweaks. I would be interested to see how Daniel Micay's hardened_malloc runs on relays. I believe it is based on OpenBSD's omalloc and the Android Bionic libc's malloc. While it is a security-focused allocator, it actually *reduces* metadata overhead and fragmentation. Depending on the configuration, it may require a LOT of VMAs, thus increasing non-swappable slab pressure.
The memory savings have been substantial: roughly 200 relays now consume ~300 GB total instead of ~1000 GB, which is a significant operational cost difference.
Most of my relays have 1-2 GB RAM, which seems to be the minimum that does not cause performance loss under memory pressure and can sustain 100 Mbps in each direction. In my case, the worst situation seems to be when so much non-swappable kernel memory is used (from the connection tracking table, socket structures, etc.) that a GFP_ATOMIC allocation fails, which locks the whole system up. Sadly, this happens even when there is plenty of swap remaining and while raising vm.min_free_kbytes helps somewhat, it's not a panacea. I suspect reduced userspace memory fragmentation would not help all that much in my particular instance, but I'm sure it's very useful on systems that have multiple relays running on them. Regards, forest -----BEGIN PGP SIGNATURE----- iHUEARYKAB0WIQQtr8ZXhq/o01Qf/pow+TRLM+X4xgUCaWNcqQAKCRAw+TRLM+X4 xsnyAQDKrQvG7Gj6gJ9ODw+GwyXELo3f3zK8ROzUx+6NCvvK5QD+Lsr2SbQxc5OE zCAlZkUcJ3GO3XgnIXNuRrn3M6myeQI= =ZrqW -----END PGP SIGNATURE-----
On 1/8/26 10:00, Tor at 1AEO via tor-relays wrote:
Results: allocator choice dominates steady-state RSS outcomes. Using mimalloc 2.x and jemalloc 5.x produced large, sustained reductions in resident memory, on the order of 70–80 percent, compared to the default glibc malloc; tcmalloc showed a smaller but measurable improvement.
At a hardened stable Gentoo Linux bare metal server there're no significant changes seen for relays using htop, maybe few 100 MiB less VIRT in the first few days. At tiny VPS (1 GiB RAM, 2 GiB swap file) using Debian Trixie and libmimalloc3 about 200-400 MiB less swap file consumption were observed.
By contrast, the two commonly cited configuration parameters, MaxMemInQueues and MaxConsensusAgeForDiffs, showed little to no impact on steady-state RSS when adjusted in isolation. +1
BTW I filed [1] a while ago. If you would set the common stats parameter to zero, would that have an impact on the mem usage? [1] https://gitlab.torproject.org/tpo/core/tor/-/issues/40958 -- Toralf
participants (3)
-
forest-relay-contact@cryptolab.net -
Tor at 1AEO -
Toralf Förster