Hi,

Summary from your email - did I miss anything?

Yes, with the general disclaimer (not to sound like a lawyer) that your mileage may vary. For example we run everything bare metal on FreeBSD and run a mix of guard/middle/exit relays. Running the same workload virtualized or on another operating system may impact the performance/overhead (either positively or negatively). Also your RAM budget of 4 GB per relay may be a bit on the safe side, I don't think it would hurt to lower this.

> What are the primary factors that justify running up to two Tor relays per physical core (leveraging SMT) versus a one-to-one mapping?

Tor relays sadly don't scale well. They fluctuate on a daily basis (the Tor network as a whole does) and even their general utilization is kind of unpredictable. So I think there are two approaches to this:

1) Run 1 relay per physical core, accepting that your CPU will idle a large amount of the time (50%+ in our case).

2) Run multiple Tor relays per physical core until you saturate 90-95% of your CPU cycles, accepting additional system overhead/congestion.

There is no right or wrong here. In our case we went with running multiple relays per core because we want to utilize the (very expensive) hardware we run on as much as possible. Every CPU cycle not spent on privacy enhancing services is a wasted CPU cycle from our point of view ;).

> Is one-to-one mapping of Tor relay to core/thread the most compute- and system-efficient approach?

Yes, this should lower the amount of congestion (interrupts and stuff). In this sense it can also be beneficial to lock your NIC/irq threads to specific cores.

> Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0 Ghz)?

Base indeed. No CPU is able to consistently maintain their turbo speed on this many cores. When all cores are utilized, the base speed pretty much is the max speed in practice.

> From your real world scenario #2 and advice for the "fastest/best hardware", would this type of server work well for a $20k budget?

Looks like a capable server. That CPU looks powerful enough but keep in mind that it has a rather low clockspeed, so you will be running many medium speed relays. Nothing wrong with that since CPUs with this many cores simply don't/can't have high base clocks. Also I think 512 GB of RAM would be enough unless you run a *lot* of relays on it (which may be a viable strategy to utilize your CPU fully).

Just a note: in my experience the Epyc platform (especially when self-build) provides a bit more bang for your buck. For example a AMD Epyc 9969 with 192 cores/384 threads@2.25 Ghz baseclock will probably outperform the Intel 6980P considerably (for Tor workloads at least), while being much cheaper (listing price at least). But of course this greatly depends on where you buy the server or parts so your mileage may vary. When I look around here locally a complete self-build system with the 192 core Epyc, 512 GB RAM and a 100 Gb/s NIC would cost ~12k excluding VAT before any tax benefits. But your proposed server will work perfectly fine as well so if you prefer a brand, go for it :).

Assuming one relay per core/thread, would this setup be capable of saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?

Assuming you get the Tor relays to saturate their cores, yes. That CPU should be able to push 40 Gb/s of Tor traffic. Our ~2019 64 core/128 thread Epyc pushes 10 Gb/s of Tor traffic on a bit less than half it's capacity. And your CPU is newer (better IPC hopefully if Intel finally stepped up their game since 2019), has much more cache and runs on DDR5 while having a bit lower base clock. So it should perform at least similar but probably better than ours.


Do you have the network capacity covered already? If you plan to do 40 Gb/s, then you also need enough peers/upstream capacity. The required networking equipment and connections themselves for this can also be costly.

Cheers,

tornth


Feb 20, 2025, 08:42 by tor@1aeo.com:
Excellent information, especially the real world scenarios! Exactly what I was looking for!

Summary from your email - did I miss anything?

To saturate 10 Gbps connection:
1) IPv4 Allocation: Use between 5 and 20. Much lower than 256 in a /24!
2) Tor Relay Count:  Run roughly ~40 to ~150, depending on CPU clock speed, i.e. faster clock, fewer relays needed.
3) CPU Utilization: 1 Tor relay per physical core preferred but okay to scale to 1 Tor relay per threads/SMT as well, up to 2x Tor relays per core/thread
4) RAM requirements: Maintain a 4:1 RAM-to-core/relay ratio (4GB per core/relay), including extra 32GB per server to cover DoS, OS, networking, etc. overheads

In general, some ideals but not required:
CPU clock speed: Higher CPU clock speed, better relay performance
RAM: Fewer relays, lower RAM requirements
RAM: Add ~32GB to overall RAM capacity sizing for OS, DNS, networking, DoS, etc.
IPv4: One IPv4 per relay with common traffic ports

Scaling: Start with 1 Tor relay per physical core, then add 1 Tor relay per thread/SMT and stop at 2 Tor relays per each core / thread.

What are the primary factors that justify running up to two Tor relays per physical core (leveraging SMT) versus a one-to-one mapping?
Ex: 37-74 for ~18.5 physical + SMT and 63-126 for ~31.5 physical + SMT.

Is one-to-one mapping of Tor relay to core/thread the most compute- and system-efficient approach?

Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0 Ghz)? I'm assuming base. Not sure if anybody has data on how these impact Tor relays?


From your real world scenario #2 and advice for the "fastest/best hardware", would this type of server work well for a $20k budget?
A single-socket Xeon 6980P (128 physical cores, 256 threads, base clock 2.0 GHz, turbo up to 3.9 GHz) with 1024GB DDR5 (maintaining a 4:1 ratio) and an AIOM Mellanox NIC be optimal? Assuming one relay per core/thread, would this setup be capable of saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?

For those curious, according to ChatGPT o3-mini-high deep research:
1) 20% cap on bandwidth contributions for exit relays is roughly 50+ Gbit/s for the largest operators.
2) 10% of Tor's consensus weight in terms of bandwidth for 2025 is roughly 90-95 Gbps of sustained bandwidth. In 2022, it would have been ~68 Gbps.
I don't plan to have this issue any time soon, but good to be aware!

Screenshots of the lengthy responses below and attached.

image.png

image.png
image.png
image.png




Sent with Proton Mail secure email.

On Tuesday, February 18th, 2025 at 2:23 PM, mail@nothingtohide.nl <mail@nothingtohide.nl> wrote:
Hi,

Many people already replied, but here are my (late) two cents.

> 1) If a full IPv4 /24 Class C was available to host Tor relays, what are some optimal ways to allocate bandwidth, CPU cores and RAM to maximize utilization of the IPv4 /24 for Tor?

"Optimal" depends on your preferences and goals. Some examples:

- IP address efficiency: run 8 relays per IPv4 address.
- Use the best ports: 256 relays (443) or 512 relays (443+80).
- Lowest kernel/system congestion: 1 locked relay per core/SMT thread combination, ideally on high clocked CPUs.
- Easiest to manage: as few relays as possible.
- Memory efficiency: only run middle relays on very high clocked CPUs (4-5 Ghz).
- Cost efficiency: run many relays on 1-2 generations old Epyc CPUs with a high core count (64 or more).

There are always constraints. The hardware/CPU/memory and bandwidth/routing capability available to you are probably not infinite. Also the Tor Project maximizes bandwidth contributions to 20% and 10% for exit relay and overall consensus weight respectively.

With 256 IP addresses on modern hardware, it will be very hard to not run in to one of these limitations long before you can make it 'optimal'. Hardware wise, one modern/current gen high performance server only running exit relays will easily push enough Tor traffic to do more than half of the total exit bandwidth of the Tor network.

My advice would be:
1) Get the fastest/best hardware with current-ish generation CPU IPC capabilities you can get within your budget. To lower complexity with keeping congestion in control, one socket is easier to deal with than a dual socket system.

(tip for NIC: if your switch/router has 10 Gb/s or 25 Gb/s ports, get some of the older Mellanox cards. They are very stable (more so than their Intel counterparts in my experience) and extremely affordable nowadays because of all the organizations that throw away their digital sovereignty and privacy of their employees/users to move to the cloud).

3) Start with 1 Tor relay per physical core (ignoring SMT). When the Tor relays have ramped up (this takes 2-3 months for guard relays) and there still is considerable headroom on the CPU (Tor runs extremely poorly at scale sadly, so this would be my expectation) then move to 1 Tor relay per thread (SMT included).

(tip: already run/'train' some Tor relays with a very limited bandwidth (2 MB/s or something) parallel to your primary ones and pin them all to 1-2 cores to let them ramp up in parallel to your primary ones. This makes it *much* less cumbersome to scale up your Tor contribution when you need/want/can do that in the future).

4) Assume at least 1 GB of RAM per relay on modern CPUs + 32 GB additionally for OS, DNS, networking and to have some headroom for DoS attacks. This may sound high, especially considering the advice in the Tor documentation. But on modern CPUs (especially with a high clockspeed) guard relays can use a lot more than 512 MB of RAM, especially when they are getting attacked. Middle and exit relays require less RAM.

Don't skimp out on system memory capacity. DDR4 RDIMMs with decent clockspeeds are so cheap nowadays. For reference: we ran our smaller Tor servers (16C@3.4Ghz) with 64 GB of RAM and had to upgrade it to 128 GB because during attacks RAM usage exceeded the amount available and killed processes.

5) If you have the IP space available, use one IPv4 address per relay and use all the good ports such as 443. If IP addresses are more scarce, it's also not bad to run 4 or 8 relays per IP address. Especially for middle and exit relays the port doesn't matter (much). Guard relays should ideally always run on a generally used (and generally unblocked) port.


> 2) If a full 10 Gbps connection was available for Tor relays, how many CPU cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps connection?

That greatly depends on the CPU and your configuration. I can offer 3 references based on real world examples. They all run a mix of guard/middle/exit relays.

1) Typical low core count (16+SMT) with higher clockspeed (3.4 Ghz) saturates a 10 Gb/s connection with ~18.5 physical cores + SMT.
2) Typical higher core count (64+SMT) with lower clockspeed (2.25 Ghz) saturates a 10 Gb/s connection with ~31.5 physical cores + SMT.
3) Typical energy efficient/low performance CPU with low core count (16) with very low clockspeed (2.0 Ghz) used often in networking appliances saturates a 10 Gb/s connection with ~75 physical cores (note: no SMT).

The amount of IP addresses required also depends on multiple factors. But I'd say that you would need between the amount and double the amount of relays of the mentioned core+SMT count in order to saturate 10 Gb/s. This would be 37-74, 63-126 and 75-150 relays respectively. So between 5 and 19 IPv4 addresses would be required at minimum, depending on CPU performance level.

RAM wise the more relays you run, the more RAM overhead you will have. So in general it's better to run less relays at a higher speed each than run many at a low clock speed. But since Tor scales so badly you need more Relays anyway so optimizing this isn't easy in practice.


> 3) Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4 addresses are required to saturate?

Double the amount compared to 10 Gb/s.


Good luck with your Tor adventure. And let us know your findings with achieving 10 Gb/s when you get there :-).

Cheers,

tornth


Feb 3, 2025, 18:14 by tor-relays@lists.torproject.org:
Hi All,

Looking for guidance around running high performance Tor relays on Ubuntu.

Few questions:
1) If a full IPv4 /24 Class C was available to host Tor relays, what are some optimal ways to allocate bandwidth, CPU cores and RAM to maximize utilization of the IPv4 /24 for Tor?

2) If a full 10 Gbps connection was available for Tor relays, how many CPU cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps connection?

3) Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4 addresses are required to saturate?

Thanks!


Sent with Proton Mail secure email.