On tis, 2016-10-25 at 22:52 +1100, teor wrote:
On 25 Oct. 2016, at 22:26, D.S. Ljungmark ljungmark@modio.se wrote:
So, Now I've taken some steps to adjust the state of the relay, and try to balance this.
To reiterate a point previously, before I start adding more tor daemons or servers to this, I want to know how to scale and optimise what is already there.
- Set up unbound in cache mode rather than use our local network
unbound
- Disabled on machine firewall (stateful)
- Ensured AES acceleration worked
- Boosted amount of open files allowed even more
- Stopped doing regular reboots and only reboot on kernel change
- Bound Tor to a single core
Tor is multi-process, so I wouldn't recommend binding it and its cpuworkers to the same core. That could degrade performance.
Acknowledged, but it does allow me to bind other things (unbound, interrupts) to other cpus, which was part of the reasoning here.
The exit is till this one: https://atlas.torproject.org/#details/5989521A85C94EE101E88B8DB2E68 321673F9405
CPU utilization of a single core on the machine never goes > 22%
Thus while it may be CPU bound, it's never maximising the CPU usage.
CPU and network are still scaling together with each other.
Load ( not cpu usage) is fairly stable and load1 hasn't gone > 0.2
It's holding between 5k and 16k sockets in use,
Having connections to 6000 relays is normal, and then there are more sockets for Exit traffic.
Is 6k normal/high/low for an exit? I'm trying to find the cause of the low performance here.
and ~3.5k sockets in TIME_WAIT state. (Fairly high amount?)
Quite normal for an Exit.
check.
So far, I'm not sure _why_ it's capping itself on bandwidth, and that's the one thing that I want to figure out before I start scaling out horizontally.
If you hover over the Advertised Bandwidth in atlas, your relay's advertised bandwidth is equal to its observed bandwidth.
Your relay's observed bandwidth is listed as 19.98 MByte / second in its descriptor: http://193.15.16.4:9030/tor/server/authority
The bandwidth authorities seem to think your relay can handle twice that, nominally 38100 KByte / second: https://consensus-health.torproject.org/consensus-health-2016-10-25-1 0-00.html#5989521A85C94EE101E88B8DB2E68321673F9405 (This is a large page)
Last time we emailed, your relay's observed bandwidth was 19.83 MByte / second. This is suspiciously stable. Your observed bandwidth should vary a lot more. But it seems capped at 20 MByte / second.
That's exactly the behaviour I see too, which is why I'm spending the time trying to figure this out ( and asking incessant questions )
Normally, I don't see that kind of limitation, so I don't _think_ it's the line, but I can't be sure, of course.
Perhaps your network link throttles traffic.
Possible, would be good to find out.
Or, the throttling is happening via CPU limiting.
Or, you have an option set that is limiting Tor's bandwidth usage directly.
Not as far as I'm aware, the only one I've set on purpouse are BandwidthBurst / BandwidthRate, both to 92MB.
Did you ever try using chutney to measure your local bandwidth? That will tell you what your CPU is capable of. (Leaving you to distinguish between config and network.)
No, will do that now to see.
Alternately, set up a relay with the same config at another provider.
Or, set up a relay with the same config on the same machine.
Or, set up a relay with a minimal config on the same machine. (Try commenting-out lines in the config one at a time. Start with RelayBandwidthRate and RelayBandwidthBurst.)
But other relays achieve much faster speeds, so it's likely something unique to your situation.
That's what I'm afraid of, I'll go play with chutney now then.
//D.S.