New constraint - any guidance? Math seem right? All relay operators / families are limited to a maximum of ~360 Tor relays: https://gitlab.torproject.org/tpo/core/tor/-/issues/40837 I'll likely create an account to reply on the gitlab ticket too since looks like different audience than those replying here.
Unfortunately, if the Tor network isn't efficient in using the bandwidth across ~4 x 10 Gbps servers then this limit will be reached, hindering known good operators, while not stopping malicious operators who don't follow the rules. Least efficient, ~512 CPU threads / relays for 4 x 10 Gbps (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 CPU threads / relays for 4 x 10 Gbps (80 threads/relays per 1 x 10 Gbps server).
Today, optimize Tor relay for the ~360 constraints: 1) Maximize bandwidth per CPU thread/core/relay with higher CPU base clock 2) Other hardware: Ensure sufficient RAM per Tor relays (4GB to 1 CPu thread) and good NIC. 3) Maximize network peering / routing strategies for Tor? Anything else?
For #3, how best to optimize network routing / peering strategies for Tor relays? This email thread was optimizing around CPU threads and RAM but having plenty of CPU threads and RAM that might be insufficient with a poor network routing/peering strategy for Tor? Is there a reasonable way or some reliable way to quickly (less than a few months of running the relays) get in the correct range of how well the Tor network uses a specific server's available bandwidth? Ex: Route hops / ping times to directory / bandwidth authorities, confirming well known upstream providers (Cogent, etc.), and/or something else? Best strategy is month-to-month renting servers and running relays rather than signing 5 year contracts to end up somewhere with poor peering/routing for Tor?
On Tuesday, February 25th, 2025 at 9:57 PM, Tor at 1AEO via tor-relays tor-relays@lists.torproject.org wrote:
Okay - makes sense on up to 2 Tor relays per physical core with the goal of not wasting CPU cycles, given the fluctuations of Tor and the expensive hardware.
No, don't have all the network capacity covered and agree everything is costly.
For 10 Gbps unmetered, not many options under $600/mo comfortable with Tor relays so I'm alternating between colocation and dedicated bare metal servers, depending on location, hardware availability, price, support to bring my own IPv4 and announce my ASN, some semblance of ASN and geographic diversity for Tor, etc.
For 40 Gbps unmetered, not seeing much under $2k/mo.
Open to suggestions / guidance on network capacity, colocation, and bare metal servers. Don't know what I don't know so maybe I should be asking other questions?
When doing colocation, any suggestions on how best to set everything up? Router or only layer 3 switch or put compute node directly on internet connection? Worth using a transparent firewall/bridging, DMZ or NAT?
On Friday, February 21st, 2025 at 2:40 AM, mail--- via tor-relays tor-relays@lists.torproject.org wrote:
Hi,
Summary from your email - did I miss anything?
Yes, with the general disclaimer (not to sound like a lawyer) that your mileage may vary. For example we run everything bare metal on FreeBSD and run a mix of guard/middle/exit relays. Running the same workload virtualized or on another operating system may impact the performance/overhead (either positively or negatively). Also your RAM budget of 4 GB per relay may be a bit on the safe side, I don't think it would hurt to lower this.
What are the primary factors that justify running up to two Tor relays per physical core (leveraging SMT) versus a one-to-one mapping?
Tor relays sadly don't scale well. They fluctuate on a daily basis (the Tor network as a whole does) and even their general utilization is kind of unpredictable. So I think there are two approaches to this:
Run 1 relay per physical core, accepting that your CPU will idle a large amount of the time (50%+ in our case).
Run multiple Tor relays per physical core until you saturate 90-95% of your CPU cycles, accepting additional system overhead/congestion.
There is no right or wrong here. In our case we went with running multiple relays per core because we want to utilize the (very expensive) hardware we run on as much as possible. Every CPU cycle not spent on privacy enhancing services is a wasted CPU cycle from our point of view ;).
Is one-to-one mapping of Tor relay to core/thread the most compute- and system-efficient approach?
Yes, this should lower the amount of congestion (interrupts and stuff). In this sense it can also be beneficial to lock your NIC/irq threads to specific cores.
Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0 Ghz)?
Base indeed. No CPU is able to consistently maintain their turbo speed on this many cores. When all cores are utilized, the base speed pretty much is the max speed in practice.
From your real world scenario #2 and advice for the "fastest/best hardware", would this type of server work well for a $20k budget?
Looks like a capable server. That CPU looks powerful enough but keep in mind that it has a rather low clockspeed, so you will be running many medium speed relays. Nothing wrong with that since CPUs with this many cores simply don't/can't have high base clocks. Also I think 512 GB of RAM would be enough unless you run a *lot* of relays on it (which may be a viable strategy to utilize your CPU fully).
Just a note: in my experience the Epyc platform (especially when self-build) provides a bit more bang for your buck. For example a AMD Epyc 9969 with 192 cores/384 threads@2.25 Ghz baseclock will probably outperform the Intel 6980P considerably (for Tor workloads at least), while being much cheaper (listing price at least). But of course this greatly depends on where you buy the server or parts so your mileage may vary. When I look around here locally a complete self-build system with the 192 core Epyc, 512 GB RAM and a 100 Gb/s NIC would cost ~12k excluding VAT before any tax benefits. But your proposed server will work perfectly fine as well so if you prefer a brand, go for it :).
Assuming one relay per core/thread, would this setup be capable of saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?
Assuming you get the Tor relays to saturate their cores, yes. That CPU should be able to push 40 Gb/s of Tor traffic. Our ~2019 64 core/128 thread Epyc pushes 10 Gb/s of Tor traffic on a bit less than half it's capacity. And your CPU is newer (better IPC hopefully if Intel finally stepped up their game since 2019), has much more cache and runs on DDR5 while having a bit lower base clock. So it should perform at least similar but probably better than ours.
Do you have the network capacity covered already? If you plan to do 40 Gb/s, then you also need enough peers/upstream capacity. The required networking equipment and connections themselves for this can also be costly.
Cheers,
tornth
Feb 20, 2025, 08:42 by tor@1aeo.com:
Excellent information, especially the real world scenarios! Exactly what I was looking for!
Summary from your email - did I miss anything?
To saturate 10 Gbps connection:
- IPv4 Allocation: Use between 5 and 20. Much lower than 256 in a /24!
- Tor Relay Count: Run roughly ~40 to ~150, depending on CPU clock speed, i.e. faster clock, fewer relays needed.
- CPU Utilization: 1 Tor relay per physical core preferred but okay to scale to 1 Tor relay per threads/SMT as well, up to 2x Tor relays per core/thread
- RAM requirements: Maintain a 4:1 RAM-to-core/relay ratio (4GB per core/relay), including extra 32GB per server to cover DoS, OS, networking, etc. overheads
In general, some ideals but not required: CPU clock speed: Higher CPU clock speed, better relay performance RAM: Fewer relays, lower RAM requirements RAM: Add ~32GB to overall RAM capacity sizing for OS, DNS, networking, DoS, etc. IPv4: One IPv4 per relay with common traffic ports
Scaling: Start with 1 Tor relay per physical core, then add 1 Tor relay per thread/SMT and stop at 2 Tor relays per each core / thread.
What are the primary factors that justify running up to two Tor relays per physical core (leveraging SMT) versus a one-to-one mapping? Ex: 37-74 for ~18.5 physical + SMT and 63-126 for ~31.5 physical + SMT.
Is one-to-one mapping of Tor relay to core/thread the most compute- and system-efficient approach?
Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0 Ghz)? I'm assuming base. Not sure if anybody has data on how these impact Tor relays?
From your real world scenario #2 and advice for the "fastest/best hardware", would this type of server work well for a $20k budget? A single-socket Xeon 6980P (128 physical cores, 256 threads, base clock 2.0 GHz, turbo up to 3.9 GHz) with 1024GB DDR5 (maintaining a 4:1 ratio) and an AIOM Mellanox NIC be optimal? Assuming one relay per core/thread, would this setup be capable of saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?
For those curious, according to ChatGPT o3-mini-high deep research:
- 20% cap on bandwidth contributions for exit relays is roughly 50+ Gbit/s for the largest operators.
- 10% of Tor's consensus weight in terms of bandwidth for 2025 is roughly 90-95 Gbps of sustained bandwidth. In 2022, it would have been ~68 Gbps.
I don't plan to have this issue any time soon, but good to be aware!
Screenshots of the lengthy responses below and attached.
[image.png]
[image.png] [image.png] [image.png]
Sent with[Proton Mail](https://proton.me/mail/home)secure email.
On Tuesday, February 18th, 2025 at 2:23 PM, mail@nothingtohide.nl mail@nothingtohide.nl wrote:
Hi,
Many people already replied, but here are my (late) two cents.
- If a full IPv4 /24 Class C was available to host Tor relays, what are some optimal ways to allocate bandwidth, CPU cores and RAM to maximize utilization of the IPv4 /24 for Tor?
"Optimal" depends on your preferences and goals. Some examples:
- IP address efficiency: run 8 relays per IPv4 address.
- Use the best ports: 256 relays (443) or 512 relays (443+80).
- Lowest kernel/system congestion: 1 locked relay per core/SMT thread combination, ideally on high clocked CPUs.
- Easiest to manage: as few relays as possible.
- Memory efficiency: only run middle relays on very high clocked CPUs (4-5 Ghz).
- Cost efficiency: run many relays on 1-2 generations old Epyc CPUs with a high core count (64 or more).
There are always constraints. The hardware/CPU/memory and bandwidth/routing capability available to you are probably not infinite. Also the Tor Project maximizes bandwidth contributions to 20% and 10% for exit relay and overall consensus weight respectively.
With 256 IP addresses on modern hardware, it will be very hard to not run in to one of these limitations long before you can make it 'optimal'. Hardware wise, one modern/current gen high performance server only running exit relays will easily push enough Tor traffic to do more than half of the total exit bandwidth of the Tor network.
My advice would be:
- Get the fastest/best hardware with current-ish generation CPU IPC capabilities you can get within your budget. To lower complexity with keeping congestion in control, one socket is easier to deal with than a dual socket system.
(tip for NIC: if your switch/router has 10 Gb/s or 25 Gb/s ports, get some of the older Mellanox cards. They are very stable (more so than their Intel counterparts in my experience) and extremely affordable nowadays because of all the organizations that throw away their digital sovereignty and privacy of their employees/users to move to the cloud).
- Start with 1 Tor relay per physical core (ignoring SMT). When the Tor relays have ramped up (this takes 2-3 months for guard relays) and there still is considerable headroom on the CPU (Tor runs extremely poorly at scale sadly, so this would be my expectation) then move to 1 Tor relay per thread (SMT included).
(tip: already run/'train' some Tor relays with a very limited bandwidth (2 MB/s or something) parallel to your primary ones and pin them all to 1-2 cores to let them ramp up in parallel to your primary ones. This makes it *much* less cumbersome to scale up your Tor contribution when you need/want/can do that in the future).
- Assume at least 1 GB of RAM per relay on modern CPUs + 32 GB additionally for OS, DNS, networking and to have some headroom for DoS attacks. This may sound high, especially considering the advice in the Tor documentation. But on modern CPUs (especially with a high clockspeed) guard relays can use a lot more than 512 MB of RAM, especially when they are getting attacked. Middle and exit relays require less RAM.
Don't skimp out on system memory capacity. DDR4 RDIMMs with decent clockspeeds are so cheap nowadays. For reference: we ran our smaller Tor servers (16C@3.4Ghz) with 64 GB of RAM and had to upgrade it to 128 GB because during attacks RAM usage exceeded the amount available and killed processes.
- If you have the IP space available, use one IPv4 address per relay and use all the good ports such as 443. If IP addresses are more scarce, it's also not bad to run 4 or 8 relays per IP address. Especially for middle and exit relays the port doesn't matter (much). Guard relays should ideally always run on a generally used (and generally unblocked) port.
- If a full 10 Gbps connection was available for Tor relays, how many CPU cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps connection?
That greatly depends on the CPU and your configuration. I can offer 3 references based on real world examples. They all run a mix of guard/middle/exit relays.
- Typical low core count (16+SMT) with higher clockspeed (3.4 Ghz) saturates a 10 Gb/s connection with ~18.5 physical cores + SMT.
- Typical higher core count (64+SMT) with lower clockspeed (2.25 Ghz) saturates a 10 Gb/s connection with ~31.5 physical cores + SMT.
- Typical energy efficient/low performance CPU with low core count (16) with very low clockspeed (2.0 Ghz) used often in networking appliances saturates a 10 Gb/s connection with ~75 physical cores (note: no SMT).
The amount of IP addresses required also depends on multiple factors. But I'd say that you would need between the amount and double the amount of relays of the mentioned core+SMT count in order to saturate 10 Gb/s. This would be 37-74, 63-126 and 75-150 relays respectively. So between 5 and 19 IPv4 addresses would be required at minimum, depending on CPU performance level.
RAM wise the more relays you run, the more RAM overhead you will have. So in general it's better to run less relays at a higher speed each than run many at a low clock speed. But since Tor scales so badly you need more Relays anyway so optimizing this isn't easy in practice.
- Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4 addresses are required to saturate?
Double the amount compared to 10 Gb/s.
Good luck with your Tor adventure. And let us know your findings with achieving 10 Gb/s when you get there :-).
Cheers,
tornth
Feb 3, 2025, 18:14 by tor-relays@lists.torproject.org:
Hi All,
Looking for guidance around running high performance Tor relays on Ubuntu.
Few questions:
If a full IPv4 /24 Class C was available to host Tor relays, what are some optimal ways to allocate bandwidth, CPU cores and RAM to maximize utilization of the IPv4 /24 for Tor?
If a full 10 Gbps connection was available for Tor relays, how many CPU cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps connection?
Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4 addresses are required to saturate?
Thanks!
Sent with[Proton Mail](https://proton.me/mail/home)secure email.
On Sunday, 9 March 2025 22:59 Tor at 1AEO via tor-relays wrote:
New constraint - any guidance? Math seem right? All relay operators / families are limited to a maximum of ~360 Tor relays: https://gitlab.torproject.org/tpo/core/tor/-/issues/40837 I'll likely create an account to reply on the gitlab ticket too since looks like different audience than those replying here.
Unfortunately, if the Tor network isn't efficient in using the bandwidth across ~4 x 10 Gbps servers then this limit will be reached, hindering known good operators, while not stopping malicious operators who don't follow the rules. Least efficient, ~512 CPU threads / relays for 4 x 10 Gbps (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 CPU threads / relays for 4 x 10 Gbps (80 threads/relays per 1 x 10 Gbps server).
Today, optimize Tor relay for the ~360 constraints:
- Maximize bandwidth per CPU thread/core/relay with higher CPU base clock
- Other hardware: Ensure sufficient RAM per Tor relays (4GB to 1 CPu
thread) and good NIC. 3) Maximize network peering / routing strategies for Tor? Anything else?
With so many relays per Family/Operator you also reach the 20% and 10% limits and /16. And you have to be able to pay the bandwidth costs. A 10G relay does 100TB/day and several PB per month.
For #3, how best to optimize network routing / peering strategies for Tor relays? This email thread was optimizing around CPU threads and RAM but having plenty of CPU threads and RAM that might be insufficient with a poor network routing/peering strategy for Tor? Is there a reasonable way or some reliable way to quickly (less than a few months of running the relays) get in the correct range of how well the Tor network uses a specific server's available bandwidth? Ex: Route hops / ping times to directory / bandwidth authorities, confirming well known upstream providers (Cogent, etc.), and/or something else? Best strategy is month-to-month renting servers and running relays rather than signing 5 year contracts to end up somewhere with poor peering/routing for Tor?
The Tor network is a dynamic massive network and bandwidth contributions and overall consensus weight are constantly changing. When a larger operator (like NTH or RWTH Aachen) goes up or down everything changes. In addition, the Tor network team and DirAuth's may change consensus rules at any time.
Diversity is important and that relays and bridges are running at all. How big an operator can be will also be a big issue when Arti and Family keys arrive. Because relayon reached 25% exit cw, the IPs were split between several orgs.
On Monday, 10 March 2025 15:34 boldsuck via tor-relays wrote:
The Tor network is a dynamic massive network and bandwidth contributions and overall consensus weight are constantly changing. When a larger operator (like NTH or RWTH Aachen) goes up or down everything changes. In addition, the Tor network team and DirAuth's may change consensus rules at any time.
2 servers, all relays same config & uptime, but still have different advertised bandwidth ;-)
https://metrics.torproject.org/rs.html#search/2a0b:f4c2:2:1:: https://metrics.torproject.org/rs.html#search/2a0b:f4c2:2::
If you are going to run that many relays on the same machine, I'd do the following:
1.) For each relay, take parallelizable Tor operations into account and set "NumCPUs" to at least 2, so that compression/decompression as well as onionskin decryption won't hog the main Tor thread / loop, which will affect performance / throughput. I used to have a KVM VM in the Czech Republic which only had one core, and every time Tor tried to compress something, it would use up the entire CPU, severely limiting bandwidth until that operation was finished.
https://2019.www.torproject.org/docs/tor-manual.html.en#NumCPUs
2.) Make sure that you enable sand-boxing for each relay, and if you are really paranoid, you might also want to create a custom systemd unit override, focused on sand-boxing each process even further, there are a ton of possible options:
Here are all the different systemd sandboxing options:
https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Sa...
And here is an example on how to use them:
https://www.opensourcerers.org/2022/04/25/optimizing-a-systemd-service-for-s...
On my distribution, ArchLinux, the Tor package already comes with a few enabled, for example (taken from the unit file shipped by the distribution):
# HardeningPrivateTmp=yes PrivateDevices=yes ProtectHome=yes ProtectSystem=full ReadOnlyDirectories=/ ReadWriteDirectories=-/var/lib/tor ReadWriteDirectories=-/var/log/tor NoNewPrivileges=yes CapabilityBoundingSet=CAP_SETUID CAP_SETGID CAP_NET_BIND_SERVICE CAP_DAC_READ_SEARCH CAP_KILL
This is a good start, especially making non-essential devices inaccessible to Tor, but you can add more sandboxing options using unit overrides.
3.) "Maximize bandwidth per CPU thread/core/relay with higher CPU base clock"
Anything starting at around 3 to 4 GHz is okay as long as you use hardware AES acceleration:
https://2019.www.torproject.org/docs/tor-manual.html.en#HardwareAccel
If you are going to spend a lot of money on a costly NIC, then maybe look into cryptography accelerator cards too, some are supported by Tor but I have never used one or seen one used.. apparently as long as OpenSSL can detect and use it, you can specify it with:
https://2019.www.torproject.org/docs/tor-manual.html.en#AccelName
4.) "Ensure sufficient RAM per Tor relays (4GB to 1 CPU thread) and good NIC."
https://2019.www.torproject.org/docs/tor-manual.html.en#MaxMemInQueues
4GB of ram is likely overkill, I ran my (rate-limited) 100 MBit/s exit-relay on my colocated server in a VM, and set MaxMemInQueues to 1024MB.. I then removed the rate-limit, and got around 350 MBit/s on a 2GHz CPU with AES-NI enabled.
Maybe try experimenting with 2GB of RAM, and setting MaxMemInQueues to that.
For now, I have nothing else to add.
Thanks, -GH On Sunday, March 9th, 2025 at 10:59 PM, Tor at 1AEO via tor-relays tor-relays@lists.torproject.org wrote:
New constraint - any guidance? Math seem right? All relay operators / families are limited to a maximum of ~360 Tor relays: https://gitlab.torproject.org/tpo/core/tor/-/issues/40837 I'll likely create an account to reply on the gitlab ticket too since looks like different audience than those replying here.
Unfortunately, if the Tor network isn't efficient in using the bandwidth across ~4 x 10 Gbps servers then this limit will be reached, hindering known good operators, while not stopping malicious operators who don't follow the rules. Least efficient, ~512 CPU threads / relays for 4 x 10 Gbps (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 CPU threads / relays for 4 x 10 Gbps (80 threads/relays per 1 x 10 Gbps server).
Today, optimize Tor relay for the ~360 constraints: 1) Maximize bandwidth per CPU thread/core/relay with higher CPU base clock 2) Other hardware: Ensure sufficient RAM per Tor relays (4GB to 1 CPu thread) and good NIC. 3) Maximize network peering / routing strategies for Tor? Anything else?
For #3, how best to optimize network routing / peering strategies for Tor relays? This email thread was optimizing around CPU threads and RAM but having plenty of CPU threads and RAM that might be insufficient with a poor network routing/peering strategy for Tor? Is there a reasonable way or some reliable way to quickly (less than a few months of running the relays) get in the correct range of how well the Tor network uses a specific server's available bandwidth? Ex: Route hops / ping times to directory / bandwidth authorities, confirming well known upstream providers (Cogent, etc.), and/or something else? Best strategy is month-to-month renting servers and running relays rather than signing 5 year contracts to end up somewhere with poor peering/routing for Tor?
On Tuesday, February 25th, 2025 at 9:57 PM, Tor at 1AEO via tor-relays tor-relays@lists.torproject.org wrote:
Okay - makes sense on up to 2 Tor relays per physical core with the goal of not wasting CPU cycles, given the fluctuations of Tor and the expensive hardware.
No, don't have all the network capacity covered and agree everything is costly.
For 10 Gbps unmetered, not many options under $600/mo comfortable with Tor relays so I'm alternating between colocation and dedicated bare metal servers, depending on location, hardware availability, price, support to bring my own IPv4 and announce my ASN, some semblance of ASN and geographic diversity for Tor, etc.
For 40 Gbps unmetered, not seeing much under $2k/mo.
Open to suggestions / guidance on network capacity, colocation, and bare metal servers. Don't know what I don't know so maybe I should be asking other questions?
When doing colocation, any suggestions on how best to set everything up? Router or only layer 3 switch or put compute node directly on internet connection? Worth using a transparent firewall/bridging, DMZ or NAT?
On Friday, February 21st, 2025 at 2:40 AM, mail--- via tor-relays tor-relays@lists.torproject.org wrote:
Hi,
Summary from your email - did I miss anything?
Yes, with the general disclaimer (not to sound like a lawyer) that your mileage may vary. For example we run everything bare metal on FreeBSD and run a mix of guard/middle/exit relays. Running the same workload virtualized or on another operating system may impact the performance/overhead (either positively or negatively). Also your RAM budget of 4 GB per relay may be a bit on the safe side, I don't think it would hurt to lower this.
What are the primary factors that justify running up to two Tor relays per physical core (leveraging SMT) versus a one-to-one mapping?
Tor relays sadly don't scale well. They fluctuate on a daily basis (the Tor network as a whole does) and even their general utilization is kind of unpredictable. So I think there are two approaches to this:
- Run 1 relay per physical core, accepting that your CPU will idle a large amount of the time (50%+ in our case).
- Run multiple Tor relays per physical core until you saturate 90-95% of your CPU cycles, accepting additional system overhead/congestion.
There is no right or wrong here. In our case we went with running multiple relays per core because we want to utilize the (very expensive) hardware we run on as much as possible. Every CPU cycle not spent on privacy enhancing services is a wasted CPU cycle from our point of view ;).
Is one-to-one mapping of Tor relay to core/thread the most compute- and system-efficient approach?
Yes, this should lower the amount of congestion (interrupts and stuff). In this sense it can also be beneficial to lock your NIC/irq threads to specific cores.
Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0 Ghz)?
Base indeed. No CPU is able to consistently maintain their turbo speed on this many cores. When all cores are utilized, the base speed pretty much is the max speed in practice.
From your real world scenario #2 and advice for the "fastest/best hardware", would this type of server work well for a $20k budget?
Looks like a capable server. That CPU looks powerful enough but keep in mind that it has a rather low clockspeed, so you will be running many medium speed relays. Nothing wrong with that since CPUs with this many cores simply don't/can't have high base clocks. Also I think 512 GB of RAM would be enough unless you run a *lot* of relays on it (which may be a viable strategy to utilize your CPU fully).
Just a note: in my experience the Epyc platform (especially when self-build) provides a bit more bang for your buck. For example a AMD Epyc 9969 with 192 cores/384 threads@2.25 Ghz baseclock will probably outperform the Intel 6980P considerably (for Tor workloads at least), while being much cheaper (listing price at least). But of course this greatly depends on where you buy the server or parts so your mileage may vary. When I look around here locally a complete self-build system with the 192 core Epyc, 512 GB RAM and a 100 Gb/s NIC would cost ~12k excluding VAT before any tax benefits. But your proposed server will work perfectly fine as well so if you prefer a brand, go for it :).
Assuming one relay per core/thread, would this setup be capable of saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?
Assuming you get the Tor relays to saturate their cores, yes. That CPU should be able to push 40 Gb/s of Tor traffic. Our ~2019 64 core/128 thread Epyc pushes 10 Gb/s of Tor traffic on a bit less than half it's capacity. And your CPU is newer (better IPC hopefully if Intel finally stepped up their game since 2019), has much more cache and runs on DDR5 while having a bit lower base clock. So it should perform at least similar but probably better than ours.
Do you have the network capacity covered already? If you plan to do 40 Gb/s, then you also need enough peers/upstream capacity. The required networking equipment and connections themselves for this can also be costly.
Cheers,
tornth
Feb 20, 2025, 08:42 by tor@1aeo.com:
Excellent information, especially the real world scenarios! Exactly what I was looking for!
Summary from your email - did I miss anything?
To saturate 10 Gbps connection:
- IPv4 Allocation: Use between 5 and 20. Much lower than 256 in a /24!
- Tor Relay Count: Run roughly ~40 to ~150, depending on CPU clock speed, i.e. faster clock, fewer relays needed.
- CPU Utilization: 1 Tor relay per physical core preferred but okay to scale to 1 Tor relay per threads/SMT as well, up to 2x Tor relays per core/thread
- RAM requirements: Maintain a 4:1 RAM-to-core/relay ratio (4GB per core/relay), including extra 32GB per server to cover DoS, OS, networking, etc. overheads
In general, some ideals but not required: CPU clock speed: Higher CPU clock speed, better relay performance RAM: Fewer relays, lower RAM requirements RAM: Add ~32GB to overall RAM capacity sizing for OS, DNS, networking, DoS, etc. IPv4: One IPv4 per relay with common traffic ports
Scaling: Start with 1 Tor relay per physical core, then add 1 Tor relay per thread/SMT and stop at 2 Tor relays per each core / thread.
What are the primary factors that justify running up to two Tor relays per physical core (leveraging SMT) versus a one-to-one mapping? Ex: 37-74 for ~18.5 physical + SMT and 63-126 for ~31.5 physical + SMT.
Is one-to-one mapping of Tor relay to core/thread the most compute- and system-efficient approach?
Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0 Ghz)? I'm assuming base. Not sure if anybody has data on how these impact Tor relays?
From your real world scenario #2 and advice for the "fastest/best hardware", would this type of server work well for a $20k budget? A single-socket Xeon 6980P (128 physical cores, 256 threads, base clock 2.0 GHz, turbo up to 3.9 GHz) with 1024GB DDR5 (maintaining a 4:1 ratio) and an AIOM Mellanox NIC be optimal? Assuming one relay per core/thread, would this setup be capable of saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?
For those curious, according to ChatGPT o3-mini-high deep research:
- 20% cap on bandwidth contributions for exit relays is roughly 50+ Gbit/s for the largest operators.
- 10% of Tor's consensus weight in terms of bandwidth for 2025 is roughly 90-95 Gbps of sustained bandwidth. In 2022, it would have been ~68 Gbps.
I don't plan to have this issue any time soon, but good to be aware!
Screenshots of the lengthy responses below and attached.
Sent with Proton Mail secure email.
On Tuesday, February 18th, 2025 at 2:23 PM, mail@nothingtohide.nl mail@nothingtohide.nl wrote:
Hi,
Many people already replied, but here are my (late) two cents.
1) If a full IPv4 /24 Class C was available to host Tor relays, what are some optimal ways to allocate bandwidth, CPU cores and RAM to maximize utilization of the IPv4 /24 for Tor?
"Optimal" depends on your preferences and goals. Some examples:
- IP address efficiency: run 8 relays per IPv4 address.
- Use the best ports: 256 relays (443) or 512 relays (443+80).
- Lowest kernel/system congestion: 1 locked relay per core/SMT thread combination, ideally on high clocked CPUs.
- Easiest to manage: as few relays as possible.
- Memory efficiency: only run middle relays on very high clocked CPUs (4-5 Ghz).
- Cost efficiency: run many relays on 1-2 generations old Epyc CPUs with a high core count (64 or more).
There are always constraints. The hardware/CPU/memory and bandwidth/routing capability available to you are probably not infinite. Also the Tor Project maximizes bandwidth contributions to 20% and 10% for exit relay and overall consensus weight respectively.
With 256 IP addresses on modern hardware, it will be very hard to not run in to one of these limitations long before you can make it 'optimal'. Hardware wise, one modern/current gen high performance server only running exit relays will easily push enough Tor traffic to do more than half of the total exit bandwidth of the Tor network.
My advice would be:
- Get the fastest/best hardware with current-ish generation CPU IPC capabilities you can get within your budget. To lower complexity with keeping congestion in control, one socket is easier to deal with than a dual socket system.
(tip for NIC: if your switch/router has 10 Gb/s or 25 Gb/s ports, get some of the older Mellanox cards. They are very stable (more so than their Intel counterparts in my experience) and extremely affordable nowadays because of all the organizations that throw away their digital sovereignty and privacy of their employees/users to move to the cloud).
- Start with 1 Tor relay per physical core (ignoring SMT). When the Tor relays have ramped up (this takes 2-3 months for guard relays) and there still is considerable headroom on the CPU (Tor runs extremely poorly at scale sadly, so this would be my expectation) then move to 1 Tor relay per thread (SMT included).
(tip: already run/'train' some Tor relays with a very limited bandwidth (2 MB/s or something) parallel to your primary ones and pin them all to 1-2 cores to let them ramp up in parallel to your primary ones. This makes it *much* less cumbersome to scale up your Tor contribution when you need/want/can do that in the future).
- Assume at least 1 GB of RAM per relay on modern CPUs + 32 GB additionally for OS, DNS, networking and to have some headroom for DoS attacks. This may sound high, especially considering the advice in the Tor documentation. But on modern CPUs (especially with a high clockspeed) guard relays can use a lot more than 512 MB of RAM, especially when they are getting attacked. Middle and exit relays require less RAM.
Don't skimp out on system memory capacity. DDR4 RDIMMs with decent clockspeeds are so cheap nowadays. For reference: we ran our smaller Tor servers (16C@3.4Ghz) with 64 GB of RAM and had to upgrade it to 128 GB because during attacks RAM usage exceeded the amount available and killed processes.
- If you have the IP space available, use one IPv4 address per relay and use all the good ports such as 443. If IP addresses are more scarce, it's also not bad to run 4 or 8 relays per IP address. Especially for middle and exit relays the port doesn't matter (much). Guard relays should ideally always run on a generally used (and generally unblocked) port.
2) If a full 10 Gbps connection was available for Tor relays, how many CPU cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps connection?
That greatly depends on the CPU and your configuration. I can offer 3 references based on real world examples. They all run a mix of guard/middle/exit relays.
- Typical low core count (16+SMT) with higher clockspeed (3.4 Ghz) saturates a 10 Gb/s connection with ~18.5 physical cores + SMT.
- Typical higher core count (64+SMT) with lower clockspeed (2.25 Ghz) saturates a 10 Gb/s connection with ~31.5 physical cores + SMT.
- Typical energy efficient/low performance CPU with low core count (16) with very low clockspeed (2.0 Ghz) used often in networking appliances saturates a 10 Gb/s connection with ~75 physical cores (note: no SMT).
The amount of IP addresses required also depends on multiple factors. But I'd say that you would need between the amount and double the amount of relays of the mentioned core+SMT count in order to saturate 10 Gb/s. This would be 37-74, 63-126 and 75-150 relays respectively. So between 5 and 19 IPv4 addresses would be required at minimum, depending on CPU performance level.
RAM wise the more relays you run, the more RAM overhead you will have. So in general it's better to run less relays at a higher speed each than run many at a low clock speed. But since Tor scales so badly you need more Relays anyway so optimizing this isn't easy in practice.
3) Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4 addresses are required to saturate?
Double the amount compared to 10 Gb/s.
Good luck with your Tor adventure. And let us know your findings with achieving 10 Gb/s when you get there :-).
Cheers,
tornth
Feb 3, 2025, 18:14 by tor-relays@lists.torproject.org:
Hi All,
Looking for guidance around running high performance Tor relays on Ubuntu.
Few questions:
- If a full IPv4 /24 Class C was available to host Tor relays, what are some optimal ways to allocate bandwidth, CPU cores and RAM to maximize utilization of the IPv4 /24 for Tor?
- If a full 10 Gbps connection was available for Tor relays, how many CPU cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps connection?
- Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4 addresses are required to saturate?
Thanks!
Sent with Proton Mail secure email.
These two, right now, don't appear too unusual. One is 40 relays at 1Gbps and the other is 80 relays at 2Gbps? What am I missing?
Unfortunate there isn't a website that graphs / charts the aggregate changes by IP address range over time, not just individual relay changes over time and aggregate at a point in time
I get the overall point that things change dynamically so maybe these were much more different at a different point in time?
On Monday, March 10th, 2025 at 10:32 AM, boldsuck via tor-relays tor-relays@lists.torproject.org wrote:
On Monday, 10 March 2025 15:34 boldsuck via tor-relays wrote:
The Tor network is a dynamic massive network and bandwidth contributions and overall consensus weight are constantly changing. When a larger operator (like NTH or RWTH Aachen) goes up or down everything changes. In addition, the Tor network team and DirAuth's may change consensus rules at any time.
2 servers, all relays same config & uptime, but still have different advertised bandwidth ;-)
https://metrics.torproject.org/rs.html#search/2a0b:f4c2:2:1:: https://metrics.torproject.org/rs.html#search/2a0b:f4c2:2::
-- ╰_╯ Ciao Marco!
Debian GNU/Linux
It's free software and it gives you freedom!_______________________________________________ tor-relays mailing list -- tor-relays@lists.torproject.org To unsubscribe send an email to tor-relays-leave@lists.torproject.org
Very helpful!
Some clarification questions:
How best to find out these 20% and 10% values at any point, especially as they fluctuate? Seems a waste to negotiate a five year financial colo or server contract when the Tor network doesn't want it due to lack of sufficient diversity?
I'm only using unmetered 10Gbps connections to avoid paying per unit of bandwidth used. Hasn't felt worth the effort metering per unit when it can swing by large amounts frequently and the pricing options vary widely with different metering methods.
Can you say more about the expected "big issue" of arti and family keys? When are they expected to arrive?
How did relayon know they were 25% of exit cw? How did they go about splitting IPs? Only place I've found so far to see cw in aggregate is here, https://nusenu.github.io/OrNetStats/#relay-families-by-consensus-weight . It'd be great for it to update more often than monthly but seems best way to view cw compared to no alternative?
On Monday, March 10th, 2025 at 7:34 AM, boldsuck via tor-relays tor-relays@lists.torproject.org wrote:
On Sunday, 9 March 2025 22:59 Tor at 1AEO via tor-relays wrote:
New constraint - any guidance? Math seem right? All relay operators / families are limited to a maximum of ~360 Tor relays: https://gitlab.torproject.org/tpo/core/tor/-/issues/40837 I'll likely create an account to reply on the gitlab ticket too since looks like different audience than those replying here.
Unfortunately, if the Tor network isn't efficient in using the bandwidth across ~4 x 10 Gbps servers then this limit will be reached, hindering known good operators, while not stopping malicious operators who don't follow the rules. Least efficient, ~512 CPU threads / relays for 4 x 10 Gbps (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 CPU threads / relays for 4 x 10 Gbps (80 threads/relays per 1 x 10 Gbps server).
Today, optimize Tor relay for the ~360 constraints:
- Maximize bandwidth per CPU thread/core/relay with higher CPU base clock
- Other hardware: Ensure sufficient RAM per Tor relays (4GB to 1 CPu
thread) and good NIC. 3) Maximize network peering / routing strategies for Tor? Anything else?
With so many relays per Family/Operator you also reach the 20% and 10% limits and /16. And you have to be able to pay the bandwidth costs. A 10G relay does 100TB/day and several PB per month.
For #3, how best to optimize network routing / peering strategies for Tor relays? This email thread was optimizing around CPU threads and RAM but having plenty of CPU threads and RAM that might be insufficient with a poor network routing/peering strategy for Tor? Is there a reasonable way or some reliable way to quickly (less than a few months of running the relays) get in the correct range of how well the Tor network uses a specific server's available bandwidth? Ex: Route hops / ping times to directory / bandwidth authorities, confirming well known upstream providers (Cogent, etc.), and/or something else? Best strategy is month-to-month renting servers and running relays rather than signing 5 year contracts to end up somewhere with poor peering/routing for Tor?
The Tor network is a dynamic massive network and bandwidth contributions and overall consensus weight are constantly changing. When a larger operator (like NTH or RWTH Aachen) goes up or down everything changes. In addition, the Tor network team and DirAuth's may change consensus rules at any time.
Diversity is important and that relays and bridges are running at all. How big an operator can be will also be a big issue when Arti and Family keys arrive. Because relayon reached 25% exit cw, the IPs were split between several orgs.
-- ╰_╯ Ciao Marco!
Debian GNU/Linux
It's free software and it gives you freedom!_______________________________________________ tor-relays mailing list -- tor-relays@lists.torproject.org To unsubscribe send an email to tor-relays-leave@lists.torproject.org
Hi,
Mar 21, 2025, 14:10 by tor-relays@lists.torproject.org:
How best to find out these 20% and 10% values at any point, especially as they fluctuate?
Nusenu's OrNetStats is the best source I have found to check on relay operator/family statistics, operating systems statistics etc. But as you have already found out as well, it refreshes only once per month or so, limiting it's usefulness. It used to be pretty much daily in the past, but it seems this isn't maintainable anymore. And with fair reasons.
I suggest contacting Nusenu if you need more frequent updates. Previously Nusenu was looking for use-cases that warranted increasing the update frequency. NTH would also benefit from more frequent updates, but we're hesitant to pressure people who volunteer their spare time for these kind of (awesome) projects in to spending even more time.
Seems a waste to negotiate a five year financial colo or server contract when the Tor network doesn't want it due to lack of sufficient diversity?
The 20% of exit capacity is indeed not a lot so you will probably reach the required amount of exit consensus weight relatively easy/fast. But the 10% cap on overall consensus weight on the other hand provides a bit more headroom for some additional guard/middle relays. So you could just start with exit relays and then check the exit consensus weight every now and then. When you hit 20%, just convert a few relays to guard/middle relays.
When you add tens of gigabits to the network, you will also increase the network size, effectively providing more exit relay headroom for other Tor operators and yourself as well. The more operators, the more everyone can grow their relays before hitting the cap. It's not perfect, but the Tor project values diversity more than increasing network capacity/speed so there is not much we can do about this. I'd say: don't go overboard with 5 year contracts. Maybe start with 20 Gb/s and then increase by chunks of 10 Gb/s when there is enough headroom. Good luck,
tornth
On Monday, March 10th, 2025 at 7:34 AM, boldsuck via tor-relays tor-relays@lists.torproject.org wrote:
On Sunday, 9 March 2025 22:59 Tor at 1AEO via tor-relays wrote:
New constraint - any guidance? Math seem right? All relay operators / families are limited to a maximum of ~360 Tor relays: https://gitlab.torproject.org/tpo/core/tor/-/issues/40837 I'll likely create an account to reply on the gitlab ticket too since looks like different audience than those replying here.
Unfortunately, if the Tor network isn't efficient in using the bandwidth across ~4 x 10 Gbps servers then this limit will be reached, hindering known good operators, while not stopping malicious operators who don't follow the rules. Least efficient, ~512 CPU threads / relays for 4 x 10 Gbps (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 CPU threads / relays for 4 x 10 Gbps (80 threads/relays per 1 x 10 Gbps server).
Today, optimize Tor relay for the ~360 constraints:
- Maximize bandwidth per CPU thread/core/relay with higher CPU base clock
- Other hardware: Ensure sufficient RAM per Tor relays (4GB to 1 CPu
thread) and good NIC. 3) Maximize network peering / routing strategies for Tor? Anything else?
With so many relays per Family/Operator you also reach the 20% and 10% limits and /16. And you have to be able to pay the bandwidth costs. A 10G relay does 100TB/day and several PB per month.
For #3, how best to optimize network routing / peering strategies for Tor relays? This email thread was optimizing around CPU threads and RAM but having plenty of CPU threads and RAM that might be insufficient with a poor network routing/peering strategy for Tor? Is there a reasonable way or some reliable way to quickly (less than a few months of running the relays) get in the correct range of how well the Tor network uses a specific server's available bandwidth? Ex: Route hops / ping times to directory / bandwidth authorities, confirming well known upstream providers (Cogent, etc.), and/or something else? Best strategy is month-to-month renting servers and running relays rather than signing 5 year contracts to end up somewhere with poor peering/routing for Tor?
The Tor network is a dynamic massive network and bandwidth contributions and overall consensus weight are constantly changing. When a larger operator (like NTH or RWTH Aachen) goes up or down everything changes. In addition, the Tor network team and DirAuth's may change consensus rules at any time.
Diversity is important and that relays and bridges are running at all. How big an operator can be will also be a big issue when Arti and Family keys arrive. Because relayon reached 25% exit cw, the IPs were split between several orgs.
-- ╰_╯ Ciao Marco!
Debian GNU/Linux
It's free software and it gives you freedom!_______________________________________________ tor-relays mailing list -- tor-relays@lists.torproject.org To unsubscribe send an email to tor-relays-leave@lists.torproject.org
tor-relays mailing list -- tor-relays@lists.torproject.org To unsubscribe send an email to tor-relays-leave@lists.torproject.org
On Friday, 21 March 2025 14:54 mail@nothingtohide.nl wrote:
Mar 21, 2025, 14:10 by Tor at 1AEO tor@1aeo.com:
How best to find out these 20% and 10% values at any point, especially as they fluctuate?
Nusenu's OrNetStats is the best source I have found to check on relay operator/family statistics, operating systems statistics etc. But as you have already found out as well, it refreshes only once per month or so, limiting it's usefulness. It used to be pretty much daily in the past, but it seems this isn't maintainable anymore. And with fair reasons.
I can only agree that Nusenu's OrNetStats is the best source. Once you have configured AROI: https://nusenu.github.io/OrNetStats/#authenticated-relay-operator-ids
you will have your personal page: https://nusenu.github.io/OrNetStats/for-privacy.net.html https://nusenu.github.io/OrNetStats/nothingtohide.nl.html
AFAIK, the fact that updates are no longer daily is a question of cost. The millions of database queries cost money. The database was sponsored by someone for nusenu for a while.
I suggest contacting Nusenu if you need more frequent updates. Previously Nusenu was looking for use-cases that warranted increasing the update frequency. NTH would also benefit from more frequent updates, but we're hesitant to pressure people who volunteer their spare time for these kind of (awesome) projects in to spending even more time.
Seems a waste to negotiate a five year financial colo or server contract when the Tor network doesn't want it due to lack of sufficient diversity?
The 20% of exit capacity is indeed not a lot so you will probably reach the required amount of exit consensus weight relatively easy/fast. But the 10% cap on overall consensus weight on the other hand provides a bit more headroom for some additional guard/middle relays. So you could just start with exit relays and then check the exit consensus weight every now and then. When you hit 20%, just convert a few relays to guard/middle relays.
When you add tens of gigabits to the network, you will also increase the network size, effectively providing more exit relay headroom for other Tor operators and yourself as well. The more operators, the more everyone can grow their relays before hitting the cap. It's not perfect, but the Tor project values diversity more than increasing network capacity/speed so there is not much we can do about this. I'd say: don't go overboard with 5 year contracts. Maybe start with 20 Gb/s and then increase by chunks of 10 Gb/s when there is enough headroom. Good luck,
tornth
On Monday, March 10th, 2025 at 7:34 AM, boldsuck via tor-relays <tor-
relays@lists.torproject.org> wrote:
On Sunday, 9 March 2025 22:59 Tor at 1AEO via tor-relays wrote:
New constraint - any guidance? Math seem right? All relay operators / families are limited to a maximum of ~360 Tor relays: https://gitlab.torproject.org/tpo/core/tor/-/issues/40837 I'll likely create an account to reply on the gitlab ticket too since looks like different audience than those replying here.
Unfortunately, if the Tor network isn't efficient in using the bandwidth across ~4 x 10 Gbps servers then this limit will be reached, hindering known good operators, while not stopping malicious operators who don't follow the rules. Least efficient, ~512 CPU threads / relays for 4 x 10 Gbps (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 CPU threads / relays for 4 x 10 Gbps (80 threads/relays per 1 x 10 Gbps server).
Today, optimize Tor relay for the ~360 constraints:
- Maximize bandwidth per CPU thread/core/relay with higher CPU base
clock 2) Other hardware: Ensure sufficient RAM per Tor relays (4GB to 1 CPu thread) and good NIC. 3) Maximize network peering / routing strategies for Tor? Anything else?
With so many relays per Family/Operator you also reach the 20% and 10% limits and /16. And you have to be able to pay the bandwidth costs. A 10G relay does 100TB/day and several PB per month.
For #3, how best to optimize network routing / peering strategies for Tor relays? This email thread was optimizing around CPU threads and RAM but having plenty of CPU threads and RAM that might be insufficient with a poor network routing/peering strategy for Tor? Is there a reasonable way or some reliable way to quickly (less than a few months of running the relays) get in the correct range of how well the Tor network uses a specific server's available bandwidth? Ex: Route hops / ping times to directory / bandwidth authorities, confirming well known upstream providers (Cogent, etc.), and/or something else? Best strategy is month-to-month renting servers and running relays rather than signing 5 year contracts to end up somewhere with poor peering/routing for Tor?
The Tor network is a dynamic massive network and bandwidth contributions and overall consensus weight are constantly changing. When a larger operator (like NTH or RWTH Aachen) goes up or down everything changes. In addition, the Tor network team and DirAuth's may change consensus rules at any time.
Diversity is important and that relays and bridges are running at all. How big an operator can be will also be a big issue when Arti and Family keys arrive. Because relayon reached 25% exit cw, the IPs were split between several orgs.
On Friday, 21 March 2025 13:53 Tor at 1AEO wrote:
Some clarification questions:
How best to find out these 20% and 10% values at any point, especially as they fluctuate?
https://nusenu.github.io/OrNetStats
Can you say more about the expected "big issue" of arti and family keys?
Bigger families and arti is multicore aware. With one or a few IPs and instances, you can achieve 10G. https://gitlab.torproject.org/tpo/core/tor/-/issues/40837
When are they expected to arrive?
Surprise :-) suddenly there after many years: https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/321-hap... https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/857
Quetzalcoatl can begin
How did relayon know they were 25% of exit cw?
https://nusenu.github.io/OrNetStats/relayon.org.html https://gitlab.torproject.org/tpo/core/tor/-/issues/40007
How did they go about splitting IPs?
Split into /26 and /27 and routed to the servers.
It'd be great for it to update more often than monthly but seems best way to view cw compared to no alternative? On Monday, March 10th, 2025 at 7:34 AM, boldsuck via tor-relays tor-relays@lists.torproject.org wrote:
On Sunday, 9 March 2025 22:59 Tor at 1AEO via tor-relays wrote:
New constraint - any guidance? Math seem right? All relay operators / families are limited to a maximum of ~360 Tor relays: https://gitlab.torproject.org/tpo/core/tor/-/issues/40837 I'll likely create an account to reply on the gitlab ticket too since looks like different audience than those replying here.
Unfortunately, if the Tor network isn't efficient in using the bandwidth across ~4 x 10 Gbps servers then this limit will be reached, hindering known good operators, while not stopping malicious operators who don't follow the rules. Least efficient, ~512 CPU threads / relays for 4 x 10 Gbps (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 CPU threads / relays for 4 x 10 Gbps (80 threads/relays per 1 x 10 Gbps server).
Today, optimize Tor relay for the ~360 constraints:
- Maximize bandwidth per CPU thread/core/relay with higher CPU base
clock 2) Other hardware: Ensure sufficient RAM per Tor relays (4GB to 1 CPu thread) and good NIC. 3) Maximize network peering / routing strategies for Tor? Anything else?
With so many relays per Family/Operator you also reach the 20% and 10% limits and /16. And you have to be able to pay the bandwidth costs. A 10G relay does 100TB/day and several PB per month.
For #3, how best to optimize network routing / peering strategies for Tor relays? This email thread was optimizing around CPU threads and RAM but having plenty of CPU threads and RAM that might be insufficient with a poor network routing/peering strategy for Tor? Is there a reasonable way or some reliable way to quickly (less than a few months of running the relays) get in the correct range of how well the Tor network uses a specific server's available bandwidth? Ex: Route hops / ping times to directory / bandwidth authorities, confirming well known upstream providers (Cogent, etc.), and/or something else? Best strategy is month-to-month renting servers and running relays rather than signing 5 year contracts to end up somewhere with poor peering/routing for Tor?
The Tor network is a dynamic massive network and bandwidth contributions and overall consensus weight are constantly changing. When a larger operator (like NTH or RWTH Aachen) goes up or down everything changes. In addition, the Tor network team and DirAuth's may change consensus rules at any time.
Diversity is important and that relays and bridges are running at all. How big an operator can be will also be a big issue when Arti and Family keys arrive. Because relayon reached 25% exit cw, the IPs were split between several orgs.
-- ╰_╯ Ciao Marco!
Debian GNU/Linux
It's free software and it gives you freedom!_______________________________________________ tor-relays mailing list -- tor-relays@lists.torproject.org To unsubscribe send an email to tor-relays-leave@lists.torproject.org
Great list of parameters to modify!
Before starting this journey, looking for more guidance on 1) when to make changes in relay lifecycle and 2) how to measure the impact of the changes
1) When is the right time in a relay lifecycle to start changing the default parameters and measuring the impact? Is it premature to change these values when a relay is less than 2 weeks old or less than a few months old as a guard relay because the load can vary very significantly?
Is it a good assumption that Tor on Ubuntu ships with the best defaults to get started or does everybody modify some set of parameters?
2) What are the best ways to measure the impact of changes to the Tor parameter file? Is this CPU and memory utilization as well as advertised bandwidth? How long to wait after Tor parameter file changes to see the impact? Would having metrics collected in Prometheus and grafana via Tor metrics port help analyze the impact or just spot check top every few days? On Wednesday, March 12th, 2025 at 8:58 AM, George Hartley hartley_george@proton.me wrote:
If you are going to run that many relays on the same machine, I'd do the following:
1.) For each relay, take parallelizable Tor operations into account and set "NumCPUs" to at least 2, so that compression/decompression as well as onionskin decryption won't hog the main Tor thread / loop, which will affect performance / throughput. I used to have a KVM VM in the Czech Republic which only had one core, and every time Tor tried to compress something, it would use up the entire CPU, severely limiting bandwidth until that operation was finished.
https://2019.www.torproject.org/docs/tor-manual.html.en#NumCPUs
2.) Make sure that you enable sand-boxing for each relay, and if you are really paranoid, you might also want to create a custom systemd unit override, focused on sand-boxing each process even further, there are a ton of possible options:
Here are all the different systemd sandboxing options:
https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Sa...
And here is an example on how to use them:
https://www.opensourcerers.org/2022/04/25/optimizing-a-systemd-service-for-s...
On my distribution, ArchLinux, the Tor package already comes with a few enabled, for example (taken from the unit file shipped by the distribution):
# Hardening PrivateTmp=yes PrivateDevices=yes ProtectHome=yes ProtectSystem=full ReadOnlyDirectories=/ ReadWriteDirectories=-/var/lib/tor ReadWriteDirectories=-/var/log/tor NoNewPrivileges=yes CapabilityBoundingSet=CAP_SETUID CAP_SETGID CAP_NET_BIND_SERVICE CAP_DAC_READ_SEARCH CAP_KILL
This is a good start, especially making non-essential devices inaccessible to Tor, but you can add more sandboxing options using unit overrides.
3.) "Maximize bandwidth per CPU thread/core/relay with higher CPU base clock"
Anything starting at around 3 to 4 GHz is okay as long as you use hardware AES acceleration:
https://2019.www.torproject.org/docs/tor-manual.html.en#HardwareAccel
If you are going to spend a lot of money on a costly NIC, then maybe look into cryptography accelerator cards too, some are supported by Tor but I have never used one or seen one used.. apparently as long as OpenSSL can detect and use it, you can specify it with:
https://2019.www.torproject.org/docs/tor-manual.html.en#AccelName
4.) "Ensure sufficient RAM per Tor relays (4GB to 1 CPU thread) and good NIC."
https://2019.www.torproject.org/docs/tor-manual.html.en#MaxMemInQueues
4GB of ram is likely overkill, I ran my (rate-limited) 100 MBit/s exit-relay on my colocated server in a VM, and set MaxMemInQueues to 1024MB.. I then removed the rate-limit, and got around 350 MBit/s on a 2GHz CPU with AES-NI enabled.
Maybe try experimenting with 2GB of RAM, and setting MaxMemInQueues to that.
For now, I have nothing else to add.
Thanks, -GH On Sunday, March 9th, 2025 at 10:59 PM, Tor at 1AEO via tor-relays tor-relays@lists.torproject.org wrote:
New constraint - any guidance? Math seem right? All relay operators / families are limited to a maximum of ~360 Tor relays: https://gitlab.torproject.org/tpo/core/tor/-/issues/40837 I'll likely create an account to reply on the gitlab ticket too since looks like different audience than those replying here.
Unfortunately, if the Tor network isn't efficient in using the bandwidth across ~4 x 10 Gbps servers then this limit will be reached, hindering known good operators, while not stopping malicious operators who don't follow the rules. Least efficient, ~512 CPU threads / relays for 4 x 10 Gbps (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 CPU threads / relays for 4 x 10 Gbps (80 threads/relays per 1 x 10 Gbps server).
Today, optimize Tor relay for the ~360 constraints:
- Maximize bandwidth per CPU thread/core/relay with higher CPU base clock
- Other hardware: Ensure sufficient RAM per Tor relays (4GB to 1 CPu thread) and good NIC.
- Maximize network peering / routing strategies for Tor?
Anything else?
For #3, how best to optimize network routing / peering strategies for Tor relays? This email thread was optimizing around CPU threads and RAM but having plenty of CPU threads and RAM that might be insufficient with a poor network routing/peering strategy for Tor? Is there a reasonable way or some reliable way to quickly (less than a few months of running the relays) get in the correct range of how well the Tor network uses a specific server's available bandwidth? Ex: Route hops / ping times to directory / bandwidth authorities, confirming well known upstream providers (Cogent, etc.), and/or something else? Best strategy is month-to-month renting servers and running relays rather than signing 5 year contracts to end up somewhere with poor peering/routing for Tor?
On Tuesday, February 25th, 2025 at 9:57 PM, Tor at 1AEO via tor-relays tor-relays@lists.torproject.org wrote:
Okay - makes sense on up to 2 Tor relays per physical core with the goal of not wasting CPU cycles, given the fluctuations of Tor and the expensive hardware.
No, don't have all the network capacity covered and agree everything is costly.
For 10 Gbps unmetered, not many options under $600/mo comfortable with Tor relays so I'm alternating between colocation and dedicated bare metal servers, depending on location, hardware availability, price, support to bring my own IPv4 and announce my ASN, some semblance of ASN and geographic diversity for Tor, etc.
For 40 Gbps unmetered, not seeing much under $2k/mo.
Open to suggestions / guidance on network capacity, colocation, and bare metal servers. Don't know what I don't know so maybe I should be asking other questions?
When doing colocation, any suggestions on how best to set everything up? Router or only layer 3 switch or put compute node directly on internet connection? Worth using a transparent firewall/bridging, DMZ or NAT?
On Friday, February 21st, 2025 at 2:40 AM, mail--- via tor-relays tor-relays@lists.torproject.org wrote:
Hi,
Summary from your email - did I miss anything?
Yes, with the general disclaimer (not to sound like a lawyer) that your mileage may vary. For example we run everything bare metal on FreeBSD and run a mix of guard/middle/exit relays. Running the same workload virtualized or on another operating system may impact the performance/overhead (either positively or negatively). Also your RAM budget of 4 GB per relay may be a bit on the safe side, I don't think it would hurt to lower this.
What are the primary factors that justify running up to two Tor relays per physical core (leveraging SMT) versus a one-to-one mapping?
Tor relays sadly don't scale well. They fluctuate on a daily basis (the Tor network as a whole does) and even their general utilization is kind of unpredictable. So I think there are two approaches to this:
Run 1 relay per physical core, accepting that your CPU will idle a large amount of the time (50%+ in our case).
Run multiple Tor relays per physical core until you saturate 90-95% of your CPU cycles, accepting additional system overhead/congestion.
There is no right or wrong here. In our case we went with running multiple relays per core because we want to utilize the (very expensive) hardware we run on as much as possible. Every CPU cycle not spent on privacy enhancing services is a wasted CPU cycle from our point of view ;).
Is one-to-one mapping of Tor relay to core/thread the most compute- and system-efficient approach?
Yes, this should lower the amount of congestion (interrupts and stuff). In this sense it can also be beneficial to lock your NIC/irq threads to specific cores.
Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0 Ghz)?
Base indeed. No CPU is able to consistently maintain their turbo speed on this many cores. When all cores are utilized, the base speed pretty much is the max speed in practice.
From your real world scenario #2 and advice for the "fastest/best hardware", would this type of server work well for a $20k budget?
Looks like a capable server. That CPU looks powerful enough but keep in mind that it has a rather low clockspeed, so you will be running many medium speed relays. Nothing wrong with that since CPUs with this many cores simply don't/can't have high base clocks. Also I think 512 GB of RAM would be enough unless you run a *lot* of relays on it (which may be a viable strategy to utilize your CPU fully).
Just a note: in my experience the Epyc platform (especially when self-build) provides a bit more bang for your buck. For example a AMD Epyc 9969 with 192 cores/384 threads@2.25 Ghz baseclock will probably outperform the Intel 6980P considerably (for Tor workloads at least), while being much cheaper (listing price at least). But of course this greatly depends on where you buy the server or parts so your mileage may vary. When I look around here locally a complete self-build system with the 192 core Epyc, 512 GB RAM and a 100 Gb/s NIC would cost ~12k excluding VAT before any tax benefits. But your proposed server will work perfectly fine as well so if you prefer a brand, go for it :).
Assuming one relay per core/thread, would this setup be capable of saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?
Assuming you get the Tor relays to saturate their cores, yes. That CPU should be able to push 40 Gb/s of Tor traffic. Our ~2019 64 core/128 thread Epyc pushes 10 Gb/s of Tor traffic on a bit less than half it's capacity. And your CPU is newer (better IPC hopefully if Intel finally stepped up their game since 2019), has much more cache and runs on DDR5 while having a bit lower base clock. So it should perform at least similar but probably better than ours.
Do you have the network capacity covered already? If you plan to do 40 Gb/s, then you also need enough peers/upstream capacity. The required networking equipment and connections themselves for this can also be costly.
Cheers,
tornth
Feb 20, 2025, 08:42 by tor@1aeo.com:
Excellent information, especially the real world scenarios! Exactly what I was looking for!
Summary from your email - did I miss anything?
To saturate 10 Gbps connection:
- IPv4 Allocation: Use between 5 and 20. Much lower than 256 in a /24!
- Tor Relay Count: Run roughly ~40 to ~150, depending on CPU clock speed, i.e. faster clock, fewer relays needed.
- CPU Utilization: 1 Tor relay per physical core preferred but okay to scale to 1 Tor relay per threads/SMT as well, up to 2x Tor relays per core/thread
- RAM requirements: Maintain a 4:1 RAM-to-core/relay ratio (4GB per core/relay), including extra 32GB per server to cover DoS, OS, networking, etc. overheads
In general, some ideals but not required: CPU clock speed: Higher CPU clock speed, better relay performance RAM: Fewer relays, lower RAM requirements RAM: Add ~32GB to overall RAM capacity sizing for OS, DNS, networking, DoS, etc. IPv4: One IPv4 per relay with common traffic ports
Scaling: Start with 1 Tor relay per physical core, then add 1 Tor relay per thread/SMT and stop at 2 Tor relays per each core / thread.
What are the primary factors that justify running up to two Tor relays per physical core (leveraging SMT) versus a one-to-one mapping? Ex: 37-74 for ~18.5 physical + SMT and 63-126 for ~31.5 physical + SMT.
Is one-to-one mapping of Tor relay to core/thread the most compute- and system-efficient approach?
Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0 Ghz)? I'm assuming base. Not sure if anybody has data on how these impact Tor relays?
From your real world scenario #2 and advice for the "fastest/best hardware", would this type of server work well for a $20k budget? A single-socket Xeon 6980P (128 physical cores, 256 threads, base clock 2.0 GHz, turbo up to 3.9 GHz) with 1024GB DDR5 (maintaining a 4:1 ratio) and an AIOM Mellanox NIC be optimal? Assuming one relay per core/thread, would this setup be capable of saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?
For those curious, according to ChatGPT o3-mini-high deep research:
- 20% cap on bandwidth contributions for exit relays is roughly 50+ Gbit/s for the largest operators.
- 10% of Tor's consensus weight in terms of bandwidth for 2025 is roughly 90-95 Gbps of sustained bandwidth. In 2022, it would have been ~68 Gbps.
I don't plan to have this issue any time soon, but good to be aware!
Screenshots of the lengthy responses below and attached.
[image.png]
[image.png] [image.png] [image.png]
Sent with[Proton Mail](https://proton.me/mail/home)secure email.
On Tuesday, February 18th, 2025 at 2:23 PM, mail@nothingtohide.nl mail@nothingtohide.nl wrote:
Hi,
Many people already replied, but here are my (late) two cents.
> 1) If a full IPv4 /24 Class C was available to host Tor relays, what are some optimal ways to allocate bandwidth, CPU cores and RAM to maximize utilization of the IPv4 /24 for Tor?
"Optimal" depends on your preferences and goals. Some examples:
- IP address efficiency: run 8 relays per IPv4 address.
- Use the best ports: 256 relays (443) or 512 relays (443+80).
- Lowest kernel/system congestion: 1 locked relay per core/SMT thread combination, ideally on high clocked CPUs.
- Easiest to manage: as few relays as possible.
- Memory efficiency: only run middle relays on very high clocked CPUs (4-5 Ghz).
- Cost efficiency: run many relays on 1-2 generations old Epyc CPUs with a high core count (64 or more).
There are always constraints. The hardware/CPU/memory and bandwidth/routing capability available to you are probably not infinite. Also the Tor Project maximizes bandwidth contributions to 20% and 10% for exit relay and overall consensus weight respectively.
With 256 IP addresses on modern hardware, it will be very hard to not run in to one of these limitations long before you can make it 'optimal'. Hardware wise, one modern/current gen high performance server only running exit relays will easily push enough Tor traffic to do more than half of the total exit bandwidth of the Tor network.
My advice would be:
- Get the fastest/best hardware with current-ish generation CPU IPC capabilities you can get within your budget. To lower complexity with keeping congestion in control, one socket is easier to deal with than a dual socket system.
(tip for NIC: if your switch/router has 10 Gb/s or 25 Gb/s ports, get some of the older Mellanox cards. They are very stable (more so than their Intel counterparts in my experience) and extremely affordable nowadays because of all the organizations that throw away their digital sovereignty and privacy of their employees/users to move to the cloud).
- Start with 1 Tor relay per physical core (ignoring SMT). When the Tor relays have ramped up (this takes 2-3 months for guard relays) and there still is considerable headroom on the CPU (Tor runs extremely poorly at scale sadly, so this would be my expectation) then move to 1 Tor relay per thread (SMT included).
(tip: already run/'train' some Tor relays with a very limited bandwidth (2 MB/s or something) parallel to your primary ones and pin them all to 1-2 cores to let them ramp up in parallel to your primary ones. This makes it *much* less cumbersome to scale up your Tor contribution when you need/want/can do that in the future).
- Assume at least 1 GB of RAM per relay on modern CPUs + 32 GB additionally for OS, DNS, networking and to have some headroom for DoS attacks. This may sound high, especially considering the advice in the Tor documentation. But on modern CPUs (especially with a high clockspeed) guard relays can use a lot more than 512 MB of RAM, especially when they are getting attacked. Middle and exit relays require less RAM.
Don't skimp out on system memory capacity. DDR4 RDIMMs with decent clockspeeds are so cheap nowadays. For reference: we ran our smaller Tor servers (16C@3.4Ghz) with 64 GB of RAM and had to upgrade it to 128 GB because during attacks RAM usage exceeded the amount available and killed processes.
- If you have the IP space available, use one IPv4 address per relay and use all the good ports such as 443. If IP addresses are more scarce, it's also not bad to run 4 or 8 relays per IP address. Especially for middle and exit relays the port doesn't matter (much). Guard relays should ideally always run on a generally used (and generally unblocked) port.
> 2) If a full 10 Gbps connection was available for Tor relays, how many CPU cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps connection?
That greatly depends on the CPU and your configuration. I can offer 3 references based on real world examples. They all run a mix of guard/middle/exit relays.
- Typical low core count (16+SMT) with higher clockspeed (3.4 Ghz) saturates a 10 Gb/s connection with ~18.5 physical cores + SMT.
- Typical higher core count (64+SMT) with lower clockspeed (2.25 Ghz) saturates a 10 Gb/s connection with ~31.5 physical cores + SMT.
- Typical energy efficient/low performance CPU with low core count (16) with very low clockspeed (2.0 Ghz) used often in networking appliances saturates a 10 Gb/s connection with ~75 physical cores (note: no SMT).
The amount of IP addresses required also depends on multiple factors. But I'd say that you would need between the amount and double the amount of relays of the mentioned core+SMT count in order to saturate 10 Gb/s. This would be 37-74, 63-126 and 75-150 relays respectively. So between 5 and 19 IPv4 addresses would be required at minimum, depending on CPU performance level.
RAM wise the more relays you run, the more RAM overhead you will have. So in general it's better to run less relays at a higher speed each than run many at a low clock speed. But since Tor scales so badly you need more Relays anyway so optimizing this isn't easy in practice.
> 3) Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4 addresses are required to saturate?
Double the amount compared to 10 Gb/s.
Good luck with your Tor adventure. And let us know your findings with achieving 10 Gb/s when you get there :-).
Cheers,
tornth
Feb 3, 2025, 18:14 by tor-relays@lists.torproject.org:
> Hi All, > > Looking for guidance around running high performance Tor relays on Ubuntu. > > Few questions: > 1) If a full IPv4 /24 Class C was available to host Tor relays, what are some optimal ways to allocate bandwidth, CPU cores and RAM to maximize utilization of the IPv4 /24 for Tor? > > 2) If a full 10 Gbps connection was available for Tor relays, how many CPU cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps connection? > > 3) Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4 addresses are required to saturate? > > Thanks! > > Sent with[Proton Mail](https://proton.me/mail/home)secure email.
Glad to see Nusenu's statistics pages are as useful for others as they are for me! And there's interest in higher frequency updates.
How best to contact Nusenu? Rough ideas on the range of costs?
Seems like the effort required to comply with AROI is worth having a dedicated statistics page. I'll keep working on the AROI support.
On Saturday, March 22nd, 2025 at 12:29 PM, boldsuck via tor-relays tor-relays@lists.torproject.org wrote:
On Friday, 21 March 2025 14:54 mail@nothingtohide.nl wrote:
Mar 21, 2025, 14:10 by Tor at 1AEO tor@1aeo.com:
How best to find out these 20% and 10% values at any point, especially as they fluctuate?
Nusenu's OrNetStats is the best source I have found to check on relay operator/family statistics, operating systems statistics etc. But as you have already found out as well, it refreshes only once per month or so, limiting it's usefulness. It used to be pretty much daily in the past, but it seems this isn't maintainable anymore. And with fair reasons.
I can only agree that Nusenu's OrNetStats is the best source. Once you have configured AROI: https://nusenu.github.io/OrNetStats/#authenticated-relay-operator-ids
you will have your personal page: https://nusenu.github.io/OrNetStats/for-privacy.net.html https://nusenu.github.io/OrNetStats/nothingtohide.nl.html
AFAIK, the fact that updates are no longer daily is a question of cost. The millions of database queries cost money. The database was sponsored by someone for nusenu for a while.
I suggest contacting Nusenu if you need more frequent updates. Previously Nusenu was looking for use-cases that warranted increasing the update frequency. NTH would also benefit from more frequent updates, but we're hesitant to pressure people who volunteer their spare time for these kind of (awesome) projects in to spending even more time.
Seems a waste to negotiate a five year financial colo or server contract when the Tor network doesn't want it due to lack of sufficient diversity?
The 20% of exit capacity is indeed not a lot so you will probably reach the required amount of exit consensus weight relatively easy/fast. But the 10% cap on overall consensus weight on the other hand provides a bit more headroom for some additional guard/middle relays. So you could just start with exit relays and then check the exit consensus weight every now and then. When you hit 20%, just convert a few relays to guard/middle relays.
When you add tens of gigabits to the network, you will also increase the network size, effectively providing more exit relay headroom for other Tor operators and yourself as well. The more operators, the more everyone can grow their relays before hitting the cap. It's not perfect, but the Tor project values diversity more than increasing network capacity/speed so there is not much we can do about this. I'd say: don't go overboard with 5 year contracts. Maybe start with 20 Gb/s and then increase by chunks of 10 Gb/s when there is enough headroom. Good luck,
tornth
On Monday, March 10th, 2025 at 7:34 AM, boldsuck via tor-relays <tor-
relays@lists.torproject.org> wrote:
On Sunday, 9 March 2025 22:59 Tor at 1AEO via tor-relays wrote:
New constraint - any guidance? Math seem right? All relay operators / families are limited to a maximum of ~360 Tor relays: https://gitlab.torproject.org/tpo/core/tor/-/issues/40837 I'll likely create an account to reply on the gitlab ticket too since looks like different audience than those replying here.
Unfortunately, if the Tor network isn't efficient in using the bandwidth across ~4 x 10 Gbps servers then this limit will be reached, hindering known good operators, while not stopping malicious operators who don't follow the rules. Least efficient, ~512 CPU threads / relays for 4 x 10 Gbps (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 CPU threads / relays for 4 x 10 Gbps (80 threads/relays per 1 x 10 Gbps server).
Today, optimize Tor relay for the ~360 constraints:
- Maximize bandwidth per CPU thread/core/relay with higher CPU base
clock 2) Other hardware: Ensure sufficient RAM per Tor relays (4GB to 1 CPu thread) and good NIC. 3) Maximize network peering / routing strategies for Tor? Anything else?
With so many relays per Family/Operator you also reach the 20% and 10% limits and /16. And you have to be able to pay the bandwidth costs. A 10G relay does 100TB/day and several PB per month.
For #3, how best to optimize network routing / peering strategies for Tor relays? This email thread was optimizing around CPU threads and RAM but having plenty of CPU threads and RAM that might be insufficient with a poor network routing/peering strategy for Tor? Is there a reasonable way or some reliable way to quickly (less than a few months of running the relays) get in the correct range of how well the Tor network uses a specific server's available bandwidth? Ex: Route hops / ping times to directory / bandwidth authorities, confirming well known upstream providers (Cogent, etc.), and/or something else? Best strategy is month-to-month renting servers and running relays rather than signing 5 year contracts to end up somewhere with poor peering/routing for Tor?
The Tor network is a dynamic massive network and bandwidth contributions and overall consensus weight are constantly changing. When a larger operator (like NTH or RWTH Aachen) goes up or down everything changes. In addition, the Tor network team and DirAuth's may change consensus rules at any time.
Diversity is important and that relays and bridges are running at all. How big an operator can be will also be a big issue when Arti and Family keys arrive. Because relayon reached 25% exit cw, the IPs were split between several orgs.
-- ╰_╯ Ciao Marco!
Debian GNU/Linux
It's free software and it gives you freedom!_______________________________________________ tor-relays mailing list -- tor-relays@lists.torproject.org To unsubscribe send an email to tor-relays-leave@lists.torproject.org
Hi, Mar 24, 2025, 10:53 by tor-relays@lists.torproject.org:
Great list of parameters to modify!
Before starting this journey, looking for more guidance on 1) when to make changes in relay lifecycle and 2) how to measure the impact of the changes
- When is the right time in a relay lifecycle to start changing the default parameters and measuring the impact?
Is it premature to change these values when a relay is less than 2 weeks old or less than a few months old as a guard relay because the load can vary very significantly?
I'd say ramping up until the point of being stable depends on the type of relay, but at least always more than 2 weeks. In my personal experience:
- Middle and exit relays need 4-5 or so weeks before I'd consider them stable enough. - Guard relays need between 8-12 or so weeks before I'd consider them stable enough.
But this doesn't mean you can't tinker with torrc settings before, it's just harder to monitor the effects of the changes.
Is it a good assumption that Tor on Ubuntu ships with the best defaults to get started or does everybody modify some set of parameters?
This greatly depends on what you want to run. The default settings aren't great for running multiple relays on a system. It's a good idea to read the Tor relay docs to get an idea about considerations. They are far from perfect and the more relays you run, the less fitting/relevant the recommendations are. But they are a decent start. You can find them here: https://community.torproject.org/relay/.
- What are the best ways to measure the impact of changes to the Tor parameter file?
Is this CPU and memory utilization as well as advertised bandwidth? How long to wait after Tor parameter file changes to see the impact? Would having metrics collected in Prometheus and grafana via Tor metrics port help analyze the impact or just spot check top every few days?
This is a problem I haven't solved yet. Changes to relays and the Tor network in general effectuate extremely slow and often inconsistent. In many cases it's hard to make hard conclusions unless you have enough scale and test with A/B setups.
Having MetricsPort/node_exporter/Prometheus/Grafana, htop, vnstat, dtrace, vmstat etc. helps a lot though. My strategy is to work with pairs of 8 identically configured relays (16 in total) on the same hardware. When I make changes to one pair (8 relays) I compare the differences in behavior/bandwidth/CPU usage/memory footprint/interrupts to the other pair over a longer period of time (2-4 weeks).
But it's really cumbersome. Throughout the years I heard from other operators as well that this 2-4 week long delays slow down optimization efforts, troubleshooting efforts and debug efforts significantly. There is just nothing we can do about it (as far as I am aware at least, please let me know otherwise ;-).
Cheers,
tornth
On Wednesday, March 12th, 2025 at 8:58 AM, George Hartley hartley_george@proton.me wrote:
If you are going to run that many relays on the same machine, I'd do the following:
1.) For each relay, take parallelizable Tor operations into account and set "NumCPUs" to at least 2, so that compression/decompression as well as onionskin decryption won't hog the main Tor thread / loop, which will affect performance / throughput. I used to have a KVM VM in the Czech Republic which only had one core, and every time Tor tried to compress something, it would use up the entire CPU, severely limiting bandwidth until that operation was finished.
https://2019.www.torproject.org/docs/tor-manual.html.en#NumCPUs
2.) Make sure that you enable sand-boxing for each relay, and if you are really paranoid, you might also want to create a custom systemd unit override, focused on sand-boxing each process even further, there are a ton of possible options:
Here are all the different systemd sandboxing options:
https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Sa...
And here is an example on how to use them:
https://www.opensourcerers.org/2022/04/25/optimizing-a-systemd-service-for-s...
On my distribution, ArchLinux, the Tor package already comes with a few enabled, for example (taken from the unit file shipped by the distribution):
# Hardening PrivateTmp=yes PrivateDevices=yes ProtectHome=yes ProtectSystem=full ReadOnlyDirectories=/ ReadWriteDirectories=-/var/lib/tor ReadWriteDirectories=-/var/log/tor NoNewPrivileges=yes CapabilityBoundingSet=CAP_SETUID CAP_SETGID CAP_NET_BIND_SERVICE CAP_DAC_READ_SEARCH CAP_KILL
This is a good start, especially making non-essential devices inaccessible to Tor, but you can add more sandboxing options using unit overrides.
3.) "Maximize bandwidth per CPU thread/core/relay with >> higher CPU base clock>> "
Anything starting at around 3 to 4 GHz is okay as long as you use hardware AES acceleration:
https://2019.www.torproject.org/docs/tor-manual.html.en#HardwareAccel
If you are going to spend a lot of money on a costly NIC, then maybe look into cryptography accelerator cards too, some are supported by Tor but I have never used one or seen one used.. apparently as long as OpenSSL can detect and use it, you can specify it with:
https://2019.www.torproject.org/docs/tor-manual.html.en#AccelName
4.) "Ensure sufficient RAM per Tor relays (4GB to 1 CPU thread) and good NIC."
https://2019.www.torproject.org/docs/tor-manual.html.en#MaxMemInQueues
4GB of ram is likely overkill, I ran my (rate-limited) 100 MBit/s exit-relay on my colocated server in a VM, and set MaxMemInQueues to 1024MB.. I then removed the rate-limit, and got around 350 MBit/s on a 2GHz CPU with AES-NI enabled.
Maybe try experimenting with 2GB of RAM, and setting >> MaxMemInQueues>> to that.
For now, I have nothing else to add.
Thanks, -GH On Sunday, March 9th, 2025 at 10:59 PM, Tor at 1AEO via tor-relays tor-relays@lists.torproject.org wrote:
New constraint - any guidance? Math seem right? All relay operators / families are limited to a maximum of ~360 Tor relays: >>> https://gitlab.torproject.org/tpo/core/tor/-/issues/40837 I'll likely create an account to reply on the gitlab ticket too since looks like different audience than those replying here.
Unfortunately, if the Tor network isn't efficient in using the bandwidth across ~4 x 10 Gbps servers then this limit will be reached, hindering known good operators, while not stopping malicious operators who don't follow the rules. Least efficient, ~512 CPU threads / relays for 4 x 10 Gbps (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 CPU threads / relays for 4 x 10 Gbps (80 threads/relays per 1 x 10 Gbps server).
Today, optimize Tor relay for the ~360 constraints: 1) Maximize bandwidth per CPU thread/core/relay with >>> higher CPU base clock 2) Other hardware: Ensure sufficient RAM per Tor relays (4GB to 1 CPu thread) and good NIC. 3) >>> Maximize network peering / routing strategies for Tor>>> ? Anything else?
For #3, how best to optimize network routing / peering strategies for Tor relays? This email thread was optimizing around CPU threads and RAM but having plenty of CPU threads and RAM that might be insufficient with a poor network routing/peering strategy for Tor? Is there a reasonable way or some reliable way to quickly (less than a few months of running the relays) get in the correct range of how well the Tor network uses a specific server's available bandwidth? Ex: Route hops / ping times to directory / bandwidth authorities, confirming well known upstream providers (Cogent, etc.), and/or something else? Best strategy is month-to-month renting servers and running relays rather than signing 5 year contracts to end up somewhere with poor peering/routing for Tor?
On Tuesday, February 25th, 2025 at 9:57 PM, Tor at 1AEO via tor-relays tor-relays@lists.torproject.org wrote:
Okay - makes sense on up to 2 Tor relays per physical core with the goal of not wasting CPU cycles, given the fluctuations of Tor and the expensive hardware.
No, don't have all the network capacity covered and agree everything is costly.
For 10 Gbps unmetered, not many options under $600/mo comfortable with Tor relays so I'm alternating between colocation and dedicated bare metal servers, depending on location, hardware availability, price, support to bring my own IPv4 and announce my ASN, some semblance of ASN and geographic diversity for Tor, etc.
For 40 Gbps unmetered, not seeing much under $2k/mo.
Open to suggestions / guidance on network capacity, colocation, and bare metal servers. Don't know what I don't know so maybe I should be asking other questions?
When doing colocation, any suggestions on how best to set everything up? Router or only layer 3 switch or put compute node directly on internet connection? Worth using a transparent firewall/bridging, DMZ or NAT?
On Friday, February 21st, 2025 at 2:40 AM, mail--- via tor-relays tor-relays@lists.torproject.org wrote:
Hi,
>>>>> Summary from your email - did I miss anything?
Yes, with the general disclaimer (not to sound like a lawyer) that your mileage may vary. For example we run everything bare metal on FreeBSD and run a mix of guard/middle/exit relays. Running the same workload virtualized or on another operating system may impact the performance/overhead (either positively or negatively). Also your RAM budget of 4 GB per relay may be a bit on the safe side, I don't think it would hurt to lower this.
What are the primary factors that justify running up to two Tor relays per physical core (leveraging SMT) versus a one-to-one mapping?
Tor relays sadly don't scale well. They fluctuate on a daily basis (the Tor network as a whole does) and even their general utilization is kind of unpredictable. So I think there are two approaches to this:
Run 1 relay per physical core, accepting that your CPU will idle a large amount of the time (50%+ in our case).
Run multiple Tor relays per physical core until you saturate 90-95% of your CPU cycles, accepting additional system overhead/congestion.
There is no right or wrong here. In our case we went with running multiple relays per core because we want to utilize the (very expensive) hardware we run on as much as possible. Every CPU cycle not spent on privacy enhancing services is a wasted CPU cycle from our point of view ;).
Is one-to-one mapping of Tor relay to core/thread the most compute- and system-efficient approach?
Yes, this should lower the amount of congestion (interrupts and stuff). In this sense it can also be beneficial to lock your NIC/irq threads to specific cores.
Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0 Ghz)?
Base indeed. No CPU is able to consistently maintain their turbo speed on this many cores. When all cores are utilized, the base speed pretty much is the max speed in practice.
From your real world scenario #2 and advice for the "fastest/best hardware", would this type of server work well for a $20k budget?
Looks like a capable server. That CPU looks powerful enough but keep in mind that it has a rather low clockspeed, so you will be running many medium speed relays. Nothing wrong with that since CPUs with this many cores simply don't/can't have high base clocks. Also I think 512 GB of RAM would be enough unless you run a *lot* of relays on it (which may be a viable strategy to utilize your CPU fully).
Just a note: in my experience the Epyc platform (especially when self-build) provides a bit more bang for your buck. For example a AMD Epyc 9969 with 192 cores/384 threads@2.25 Ghz baseclock will probably outperform the Intel 6980P considerably (for Tor workloads at least), while being much cheaper (listing price at least). But of course this greatly depends on where you buy the server or parts so your mileage may vary. When I look around here locally a complete self-build system with the 192 core Epyc, 512 GB RAM and a 100 Gb/s NIC would cost ~12k excluding VAT before any tax benefits. But your proposed server will work perfectly fine as well so if you prefer a brand, go for it :).
>>>>> Assuming one relay per core/thread, would this setup be capable of saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?
Assuming you get the Tor relays to saturate their cores, yes. That CPU should be able to push 40 Gb/s of Tor traffic.>>>>> Our ~2019 64 core/128 thread Epyc pushes 10 Gb/s of Tor traffic on a bit less than half it's capacity. And your CPU is newer (better IPC hopefully if Intel finally stepped up their game since 2019), has much more cache and runs on DDR5 while having a bit lower base clock. So it should perform at least similar but probably better than ours.
Do you have the network capacity covered already? If you plan to do 40 Gb/s, then you also need enough peers/upstream capacity. The required networking equipment and connections themselves for this can also be costly.
Cheers,
tornth
Feb 20, 2025, 08:42 by tor@1aeo.com:
Excellent information, especially the real world scenarios! Exactly what I was looking for!
Summary from your email - did I miss anything?
To saturate 10 Gbps connection:
- IPv4 Allocation: Use between 5 and 20. Much lower than 256 in a /24!
- Tor Relay Count: Run roughly ~40 to ~150, depending on CPU clock speed, i.e. faster clock, fewer relays needed.
- CPU Utilization: 1 Tor relay per physical core preferred but okay to scale to 1 Tor relay per threads/SMT as well, up to 2x Tor relays per core/thread
- RAM requirements: Maintain a 4:1 RAM-to-core/relay ratio (4GB per core/relay), including extra 32GB per server to cover DoS, OS, networking, etc. overheads
In general, some ideals but not required: CPU clock speed: Higher CPU clock speed, better relay performance RAM: Fewer relays, lower RAM requirements RAM: Add ~32GB to overall RAM capacity sizing for OS, DNS, networking, DoS, etc. IPv4: One IPv4 per relay with common traffic ports
Scaling: Start with 1 Tor relay per physical core, then add 1 Tor relay per thread/SMT and stop at 2 Tor relays per each core / thread.
What are the primary factors that justify running up to two Tor relays per physical core (leveraging SMT) versus a one-to-one mapping? Ex: 37-74 for ~18.5 physical + SMT and 63-126 for ~31.5 physical + SMT.
Is one-to-one mapping of Tor relay to core/thread the most compute- and system-efficient approach?
Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0 Ghz)? I'm assuming base. Not sure if anybody has data on how these impact Tor relays?
From your real world scenario #2 and advice for the "fastest/best hardware", would this type of server work well for a $20k budget? A single-socket Xeon 6980P (128 physical cores, 256 threads, base clock 2.0 GHz, turbo up to 3.9 GHz) with 1024GB DDR5 (maintaining a 4:1 ratio) and an AIOM Mellanox NIC be optimal? Assuming one relay per core/thread, would this setup be capable of saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?
For those curious, according to ChatGPT o3-mini-high deep research:
- 20% cap on bandwidth contributions for exit relays is roughly 50+ Gbit/s for the largest operators.
- 10% of Tor's consensus weight in terms of bandwidth for 2025 is roughly 90-95 Gbps of sustained bandwidth. In 2022, it would have been ~68 Gbps.
I don't plan to have this issue any time soon, but good to be aware!
Screenshots of the lengthy responses below and attached.
Sent with>>>>>> >>>>>> Proton Mail https://proton.me/mail/home>>>>>> >>>>>> secure email.
On Tuesday, February 18th, 2025 at 2:23 PM, mail@nothingtohide.nl mail@nothingtohide.nl wrote:
> Hi, > > Many people already replied, but here are my (late) two cents. > > > 1) If a full IPv4 /24 Class C was available to host Tor relays, what are some optimal ways to allocate bandwidth, CPU cores and RAM to maximize utilization of the IPv4 /24 for Tor? > > "Optimal" depends on your preferences and goals. Some examples: > > - IP address efficiency: run 8 relays per IPv4 address. > - Use the best ports: 256 relays (443) or 512 relays (443+80). > - Lowest kernel/system congestion: 1 locked relay per core/SMT thread combination, ideally on high clocked CPUs. > - Easiest to manage: as few relays as possible. > - Memory efficiency: only run middle relays on very high clocked CPUs (4-5 Ghz). > - Cost efficiency: run many relays on 1-2 generations old Epyc CPUs with a high core count (64 or more). > > There are always constraints. The hardware/CPU/memory and bandwidth/routing capability available to you are probably not infinite. Also the Tor Project maximizes bandwidth contributions to 20% and 10% for exit relay and overall consensus weight respectively. > > With 256 IP addresses on modern hardware, it will be very hard to not run in to one of these limitations long before you can make it 'optimal'. Hardware wise, one modern/current gen high performance server only running exit relays will easily push enough Tor traffic to do more than half of the total exit bandwidth of the Tor network. > > My advice would be: > 1) Get the fastest/best hardware with current-ish generation CPU IPC capabilities you can get within your budget. To lower complexity with keeping congestion in control, one socket is easier to deal with than a dual socket system. > > (tip for NIC: if your switch/router has 10 Gb/s or 25 Gb/s ports, get some of the older Mellanox cards. They are very stable (more so than their Intel counterparts in my experience) and extremely affordable nowadays because of all the organizations that throw away their digital sovereignty and privacy of their employees/users to move to the cloud). > > 3) Start with 1 Tor relay per physical core (ignoring SMT). When the Tor relays have ramped up (this takes 2-3 months for guard relays) and there still is considerable headroom on the CPU (Tor runs extremely poorly at scale sadly, so this would be my expectation) then move to 1 Tor relay per thread (SMT included). > > (tip: already run/'train' some Tor relays with a very limited bandwidth (2 MB/s or something) parallel to your primary ones and pin them all to 1-2 cores to let them ramp up in parallel to your primary ones. This makes it *much* less cumbersome to scale up your Tor contribution when you need/want/can do that in the future). > > 4) Assume at least 1 GB of RAM per relay on modern CPUs + 32 GB additionally for OS, DNS, networking and to have some headroom for DoS attacks. This may sound high, especially considering the advice in the Tor documentation. But on modern CPUs (especially with a high clockspeed) guard relays can use a lot more than 512 MB of RAM, especially when they are getting attacked. Middle and exit relays require less RAM. > > Don't skimp out on system memory capacity. DDR4 RDIMMs with decent clockspeeds are so cheap nowadays. For reference: we ran our smaller Tor servers (16C@3.4Ghz) with 64 GB of RAM and had to upgrade it to 128 GB because during attacks RAM usage exceeded the amount available and killed processes. > > 5) If you have the IP space available, use one IPv4 address per relay and use all the good ports such as 443. If IP addresses are more scarce, it's also not bad to run 4 or 8 relays per IP address. Especially for middle and exit relays the port doesn't matter (much). Guard relays should ideally always run on a generally used (and generally unblocked) port. > > > > 2) If a full 10 Gbps connection was available for Tor relays, how many CPU cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps connection? > > That greatly depends on the CPU and your configuration. I can offer 3 references based on real world examples. They all run a mix of guard/middle/exit relays. > > 1) Typical low core count (16+SMT) with higher clockspeed (3.4 Ghz) saturates a 10 Gb/s connection with ~18.5 physical cores + SMT. > 2) Typical higher core count (64+SMT) with lower clockspeed (2.25 Ghz) saturates a 10 Gb/s connection with ~31.5 physical cores + SMT. > 3) Typical energy efficient/low performance CPU with low core count (16) with very low clockspeed (2.0 Ghz) used often in networking appliances saturates a 10 Gb/s connection with ~75 physical cores (note: no SMT). > > The amount of IP addresses required also depends on multiple factors. But I'd say that you would need between the amount and double the amount of relays of the mentioned core+SMT count in order to saturate 10 Gb/s. This would be 37-74, 63-126 and 75-150 relays respectively. So between 5 and 19 IPv4 addresses would be required at minimum, depending on CPU performance level. > > RAM wise the more relays you run, the more RAM overhead you will have. So in general it's better to run less relays at a higher speed each than run many at a low clock speed. But since Tor scales so badly you need more Relays anyway so optimizing this isn't easy in practice. > > > > 3) Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4 addresses are required to saturate? > > Double the amount compared to 10 Gb/s. > > > Good luck with your Tor adventure. And let us know your findings with achieving 10 Gb/s when you get there :-). > > Cheers, > > tornth > > > Feb 3, 2025, 18:14 by tor-relays@lists.torproject.org: > >> Hi All, >> >> Looking for guidance around running high performance Tor relays on Ubuntu. >> >> Few questions: >> 1) If a full IPv4 /24 Class C was available to host Tor relays, what are some optimal ways to allocate bandwidth, CPU cores and RAM to maximize utilization of the IPv4 /24 for Tor? >> >> 2) If a full 10 Gbps connection was available for Tor relays, how many CPU cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps connection? >> >> 3) Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4 addresses are required to saturate? >> >> Thanks! >> >> >> Sent with>>>>>>>> >>>>>>>> Proton Mail https://proton.me/mail/home>>>>>>>> >>>>>>>> secure email. >> > >
Regarding 1AEO: If you allocate the entire /27 sub-net to your interface(s), then you can use something like vnstatd in order to create very detailed statistics, both in CLI and as image files.
Here is how that would look (inline images might not work in mailing list, check summary.png if you can not see an image here):
These stats are from my personal workstation, which is also running a Snowflake proxy, with about 3-10 Tor users connecting per day. Best wishes, -GH
On Friday, March 21st, 2025 at 1:24 PM, Tor at 1AEO via tor-relays tor-relays@lists.torproject.org wrote:
Unfortunate there isn't a website that graphs / charts the aggregate changes by IP address range over time, not just individual relay changes over time and aggregate at a point in time
On Monday, March 10th, 2025 at 10:32 AM, boldsuck via tor-relays tor-relays@lists.torproject.org wrote:
On Monday, 10 March 2025 15:34 boldsuck via tor-relays wrote:
The Tor network is a dynamic massive network and bandwidth contributions and overall consensus weight are constantly changing. When a larger operator (like NTH or RWTH Aachen) goes up or down everything changes. In addition, the Tor network team and DirAuth's may change consensus rules at any time.
2 servers, all relays same config & uptime, but still have different advertised bandwidth ;-)
https://metrics.torproject.org/rs.html#search/2a0b:f4c2:2:1:: https://metrics.torproject.org/rs.html#search/2a0b:f4c2:2::
-- ╰_╯ Ciao Marco!
Debian GNU/Linux
It's free software and it gives you freedom!_______________________________________________ tor-relays mailing list -- tor-relays@lists.torproject.org To unsubscribe send an email to tor-relays-leave@lists.torproject.org
tor-relays mailing list -- tor-relays@lists.torproject.org To unsubscribe send an email to tor-relays-leave@lists.torproject.org
tor-relays@lists.torproject.org