On Fri, Jan 2, 2015 at 12:24 PM, Konstantin Belousov kostikbel@gmail.com wrote:
On Fri, Jan 02, 2015 at 09:09:34AM -0500, grarpamp wrote:
Some recent FreeBSD related questions in this app area.
What is the question ?
As a background, I can repeat that FreeBSD implements syscall-less gettimeofday() and clock_gettime() for x86 machines which have usable RDTSC. The selection of the timecounter can be verified by sysctl kern.timecounter.hardware, and enabled by default fast gettimeofday(2) can be checked by sysctl kern.timecounter.fast_gettime.
On some Nehalem machine, I see it doing ~30M calls/sec with enabled fast_gettime, and ~6.25M calls/sec with disabled fast_gettime. This is measured on 2.8GHz Core i7 930 with src/tools/tools/syscall_timing.
Check your timecounter hardware. Since it was noted that the tests were done in VM, check the quality of RDTSC emulation in your hypervisor.
https://lists.torproject.org/pipermail/tor-dev/2015-January/thread.html http://docs.freebsd.org/mail/current/freebsd-performance.html
Maybe I can just refer non subscribers out to the two lists above that way in case anyone sees anything interesting they can join/comment as desired.
Background might be that Tor operators have some large relays on *BSD and were looking to validate, and ways to improve, performance there.
Cheers.
https://lists.torproject.org/pipermail/tor-relays/ https://lists.torproject.org/pipermail/tor-talk/
On Fri, 2 Jan 2015 18:15:06 -0500 grarpamp grarpamp@gmail.com wrote:
On Fri, Jan 2, 2015 at 12:24 PM, Konstantin Belousov kostikbel@gmail.com wrote:
On Fri, Jan 02, 2015 at 09:09:34AM -0500, grarpamp wrote:
Some recent FreeBSD related questions in this app area.
What is the question ?
As a background, I can repeat that FreeBSD implements syscall-less gettimeofday() and clock_gettime() for x86 machines which have usable RDTSC. The selection of the timecounter can be verified by sysctl kern.timecounter.hardware, and enabled by default fast gettimeofday(2) can be checked by sysctl kern.timecounter.fast_gettime.
On some Nehalem machine, I see it doing ~30M calls/sec with enabled fast_gettime, and ~6.25M calls/sec with disabled fast_gettime. This is measured on 2.8GHz Core i7 930 with src/tools/tools/syscall_timing.
Check your timecounter hardware. Since it was noted that the tests were done in VM, check the quality of RDTSC emulation in your hypervisor.
This all is kind of a moot point because even if the relevant time calls did take ~2 usec it still doesn't explain the performance issues, and my curiosity is close to being exhausted. But, for what it's worth.
Forcing the timecounter hardware source to "TSC" in my VM results in a saner value (~45 ns). That said, I'm not sure if the clock source is actually sane. A quick skim through the code suggests that there's a decent number of things that would keep the TSC from being used, though VirtualBox supports the P-state invariant TSC cpuid bit (Linux picks it up), so why I'm seeing this behavior eludes me.
Curiosity exhausted at this point,
On 01/03/2015 02:36 AM, Yawning Angel wrote:
This all is kind of a moot point because even if the relevant time calls did take ~2 usec it still doesn't explain the performance issues, and my curiosity is close to being exhausted. But, for what it's worth.
Forcing the timecounter hardware source to "TSC" in my VM results in a saner value (~45 ns). That said, I'm not sure if the clock source is actually sane. A quick skim through the code suggests that there's a decent number of things that would keep the TSC from being used, though VirtualBox supports the P-state invariant TSC cpuid bit (Linux picks it up), so why I'm seeing this behavior eludes me.
Curiosity exhausted at this point,
Fair enough. I agree that this was less fruitful than we had originally hoped.
Do you have any other suggestions for what the issue might be, or what profiling tools I could use to find it? I'm eager to keep working on this, but I don't know which direction I should take.