On 9 Jul 2017, at 01:36, Clodo clodo@clodo.it wrote:
Tor uses multithreaded crypto already: depending on the speed of your processor, you can get up to 400 Mbps per instance (250 Mbps is typical).
Here i see a pending project: https://trac.torproject.org/projects/tor/ticket/1749 and plans about that: https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/Multithreaded... are stalled from years because of complex implementation (and the current possibility of running multiple daemon lower the priority, of course).
Maybe Tor uses multithread for some activity, but actually it's not a real/full multithread implementation from what i understand. Otherwise, what are the reason to run multiple daemon on the same server like a lot of people do with high-capacity 1gbit/s server does?
If i have a 10gbit/s unmetered port with a 24-core CPU, i'm forced to run multiple-daemons on multiple ip to use it at maximum. And the world see a lots of relay (within the same Family, of course). This is a wrong approach for me:
- overhead OONION/Atlas. I know a guy that run over 30 relay on
3/4 physical machine (so for OONION/Atlas, 30 relay to track, collect stats and so on).
Yes, I have a similar issue where I have 8 relays across 2 machines.
- requirement to Tor relay volunteers to obtain many IP address to run
high-capacity servers.
Yes, this can be a problem.
- configuration issue, for example about running multiple-daemon with
systemd.
tor-instance-create is your friend here, at least on Debian and Ubuntu. It works really well.
Or you can use a tool like ansible-relayor for multiple servers.
so i'm thinking if can exists a better, easy solution. I just want to be constructive, i'm a open-source tools developer in my spare time.
We would welcome development help with the multithreaded crypto. But it's a complicated part of the code, and a large patch, so it might be a good idea to start with something small first, to learn our processes and coding standards.
Here's some background information:
https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam#Becoming... https://gitweb.torproject.org/tor.git/tree/doc/HACKING/README.1st.md https://gitweb.torproject.org/user/nickm/torguts.git/tree/
And here are some smaller tickets, you'll want the ones under "Core Tor/Tor": (They might not all be "easy".)
https://trac.torproject.org/projects/tor/report/30
These things will break:
- if multiple tor daemons update the same onion keys at the same time,
the key files may get corrupted or the cross-certification may not refer to the keys being used. This would break all Tor instances for any circuits after a week or a month (depending on the tor version).
- your relays will place additional load on the directory authorities
by uploading multiple identical descriptors
- if these descriptors ever get out of sync, they will replace each
other, causing unpredictable behaviour
Because clients expect to access the same process with the same identity:
- your relay will not be usable as an HSDir
- your relay will not be usable as an Introduction Point
- your relay will not be usable as a Rendezvous Point
Honestly i don't know well this kind of details. It's the reason of this discussion with people like you. Maybe it's possibile to simply develop or patch specific options to obtain the objective to "made easy" an high-capacity server. Probably more easy than the works linked above about the Parallelizingcellcrypto. For example, "uploading multiple identical descriptors" maybe be simply avoided with an option that identify the 'master' daemon, other daemons simply skip the descriptors upload phase.
PublishServerDescriptor is an existing option that does this.
Or some kind of syncronization channel between daemons, which currently do not communicate with each other.
I think this would be a very complicated patch, with some security, reliability, and performance drawbacks. It would also be hard to test.
For example, in the rendezvous point case, you would have to pass high-volume circuit traffic across this link.
On 9 Jul 2017, at 02:10, Roman Mamedov rm@romanrm.net wrote:
On Sat, 8 Jul 2017 09:54:20 +1000 teor teor2345@gmail.com wrote:
Tor uses multithreaded crypto already: depending on the speed of your processor, you can get up to 400 Mbps per instance (250 Mbps is typical).
In practice I don't remember seeing much more than 120-130% CPU use per process, and even that, only in brief peaks. Maybe crypto is not actually the bottleneck, but some other non-parallel operation instead.
Speaking of CPU use, is there any roadmap to phase out TAP mode circuits? IIRC those are very CPU-expensive compared to NTor. Even though now TAP counts are only 10-20% compared to NTor, could it be that those are actually responsible for something like 50%+ of total CPU usage.
ntor handshakes have been preferred since 0.2.4.17-rc (September 2013). We made them mandatory in 0.2.9.3-alpha (September 2016).
But the legacy hidden service protocol still requires TAP, so it can only be phased out when there are no longer any legacy hidden services on the network. I think that would be January 1, 2020 at the earliest, because we promised to support 0.2.9 until then:
https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/CoreTorR...
Unless, of course, we choose to disable legacy hidden services earlier than that.
T
-- Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n xmpp: teor at torproject dot org ------------------------------------------------------------------------