How to check if the system supports Monotonic Clock?
Per the letter of POSIX, you may in fact need a runtime test even if the constant CLOCK_MONOTONIC
is defined. The official way to handle this is with the _POSIX_MONOTONIC_CLOCK
"feature-test macro", but those macros have really complicated semantics: quoting http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/unistd.h.html ,
If a symbolic constant is not defined or is defined with the value -1, the option is not supported for compilation. If it is defined with a value greater than zero, the option shall always be supported when the application is executed. If it is defined with the value zero, the option shall be supported for compilation and might or might not be supported at runtime.
Translating that three-way distinction into code would give you something like this:
#if !defined _POSIX_MONOTONIC_CLOCK || _POSIX_MONOTONIC_CLOCK < 0
clock_gettime(CLOCK_REALTIME, &spec);
#elif _POSIX_MONOTONIC_CLOCK > 0
clock_gettime(CLOCK_MONOTONIC, &spec);
#else
if (clock_gettime(CLOCK_MONOTONIC, &spec))
clock_gettime(CLOCK_REALTIME, &spec));
#endif
But it's simpler and more readable if you just always do the runtime test when CLOCK_MONOTONIC itself is defined:
#ifdef CLOCK_MONOTONIC
if (clock_gettime(CLOCK_MONOTONIC, &spec))
#endif
clock_gettime(CLOCK_REALTIME, &spec);
This increases the size of your code by some trivial amount on current-generation OSes that do support CLOCK_MONOTONIC
, but the readability benefits are worth it in my opinion.
There is also a pretty strong argument for using CLOCK_MONOTONIC
unconditionally; you're more likely to find an OS that doesn't support clock_gettime
at all (e.g. MacOS X still doesn't have it as far as I know) than an OS that has clock_gettime
but not CLOCK_MONOTONIC
.
Difference between CLOCK_REALTIME and CLOCK_MONOTONIC?
CLOCK_REALTIME
represents the machine's best-guess as to the current wall-clock, time-of-day time. As Ignacio and MarkR say, this means that CLOCK_REALTIME
can jump forwards and backwards as the system time-of-day clock is changed, including by NTP.
CLOCK_MONOTONIC
represents the absolute elapsed wall-clock time since some arbitrary, fixed point in the past. It isn't affected by changes in the system time-of-day clock.
If you want to compute the elapsed time between two events observed on the one machine without an intervening reboot, CLOCK_MONOTONIC
is the best option.
Note that on Linux, CLOCK_MONOTONIC
does not measure time spent in suspend, although by the POSIX definition it should. You can use the Linux-specific CLOCK_BOOTTIME
for a monotonic clock that keeps running during suspend.
Is clock_gettime() adequate for submicrosecond timing?
No. You'll have to use platform-specific code to do it. On x86 and x86-64, you can use 'rdtsc' to read the Time Stamp Counter.
Just port the rdtsc assembly you're using.
__inline__ uint64_t rdtsc(void) {
uint32_t lo, hi;
__asm__ __volatile__ ( // serialize
"xorl %%eax,%%eax \n cpuid"
::: "%rax", "%rbx", "%rcx", "%rdx");
/* We cannot use "=A", since this would use %rax on x86_64 and return only the lower 32bits of the TSC */
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}
Timing a process in C using clock(), time(), clock_gettimes() and the rdtsc() intrinsic returning confusing values
If someone else comes across this I came across this paper and am using this to time my code https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
clock_gettime takes longer to execute when program run from terminal
Just add more iterations to give the CPU time to ramp up to max clock speed. Your "slow" times are with the CPU at low-power idle clockspeed.
QtCreator apparently uses enough CPU time to make this happen before your program runs, or else you're compiling + running and the compilation process serves as a warm-up. (vs. bash
's fork/execve being lighter weight.)
See Idiomatic way of performance evaluation? for more about doing warm-up runs when benchmarking, and also Why does this delay-loop start to run faster after several iterations with no sleep?
On my i7-6700k (Skylake) running Linux, increasing the loop iteration count to 1000 is sufficient to get the final iterations running at full clock speed, even after the first couple iterations handling page faults, warming up the iTLB, uop cache, data caches, and so on.
$ ./a.out
It took 244 ns
It took 150 ns
It took 73 ns
It took 76 ns
It took 75 ns
It took 71 ns
It took 72 ns
It took 72 ns
It took 69 ns
It took 75 ns
...
It took 74 ns
It took 68 ns
It took 69 ns
It took 72 ns
It took 72 ns # 382 "slow" iterations in this test run (copy/paste into wc to check)
It took 15 ns
It took 15 ns
It took 15 ns
It took 15 ns
It took 16 ns
It took 16 ns
It took 15 ns
It took 15 ns
It took 15 ns
It took 15 ns
It took 14 ns
It took 16 ns
...
On my system, energy_performance_preference is set to balance_performance
, so the hardware P-state governor isn't as aggressive as with performance
. Use grep . /sys/devices/system/cpu/cpufreq/policy[0-9]*/energy_performance_preference
to check, use sudo
to change it:
sudo sh -c 'for i in /sys/devices/system/cpu/cpufreq/policy[0-9]*/energy_performance_preference;do echo balance_performance > "$i";done'
Even running it under perf stat ./a.out
is enough to ramp up to max clock speed very quickly, though; it really don't take much. But bash
's command parsing after you press return is very cheap, not much CPU work done before it calls execve
and reaches main
in your new process.
The printf
with line-buffered output is what takes most of the CPU time in your program, BTW. That's why it takes so few iterations to ramp up to speed. e.g. if you run perf stat --all-user -r10 ./a.out
, you'll see the user-space core clock cycles per second are only like 0.4GHz, the rest of the time spent in the kernel in write
system calls.
Related Topics
Linux: How to Send Tcp Packet from Specific Port
Enabling The Vt-X Inside a Virtual Machine
Vagrant - How to Mount The Virtualbox Shared Folder? ("Vboxsf" Not Available)
How Does Linux Kernel Prevents The Bios System Calls
Trying to Delete All But Most Recent 2 Files in Sub Directories
Example of Using External Libraries or Packages in Common Lisp
Replace in a CSV File Value of a Column
Nasm X86_64 Assembly in 32-Bit Mode: Why Does This Instruction Produce Rip-Relative Addressing Code
Binding on a Port with Netpipes/Netcat
Synchronize Shell Script Execution
Replace Strings with Evaluated String Based on Matched Group (Elegant Way, Not Using for .. In)
Extract Parent Domain Name from a List of Url Through Bash Shellscripting
Are Debug Symbols Loaded into Memory on Linux
Apache/Httpd /Var/Www/HTML/ .Cgi Scripts Throw 500 Internal Server Error
Best Linux Filesystem Filter Option