How to Measure Cpu Time and Wall Clock Time on Both Linux/Windows

How can I measure CPU time and wall clock time on both Linux/Windows?

Here's a copy-paste solution that works on both Windows and Linux as well as C and C++.

As mentioned in the comments, there's a boost library that does this. But if you can't use boost, this should work:

//  Windows
#ifdef _WIN32
#include <Windows.h>
double get_wall_time(){
    LARGE_INTEGER time,freq;
    if (!QueryPerformanceFrequency(&freq)){
        //  Handle error
        return 0;
    }
    if (!QueryPerformanceCounter(&time)){
        //  Handle error
        return 0;
    }
    return (double)time.QuadPart / freq.QuadPart;
}
double get_cpu_time(){
    FILETIME a,b,c,d;
    if (GetProcessTimes(GetCurrentProcess(),&a,&b,&c,&d) != 0){
        //  Returns total user time.
        //  Can be tweaked to include kernel times as well.
        return
            (double)(d.dwLowDateTime |
            ((unsigned long long)d.dwHighDateTime << 32)) * 0.0000001;
    }else{
        //  Handle error
        return 0;
    }
}

//  Posix/Linux
#else
#include <time.h>
#include <sys/time.h>
double get_wall_time(){
    struct timeval time;
    if (gettimeofday(&time,NULL)){
        //  Handle error
        return 0;
    }
    return (double)time.tv_sec + (double)time.tv_usec * .000001;
}
double get_cpu_time(){
    return (double)clock() / CLOCKS_PER_SEC;
}
#endif

There's a bunch of ways to implement these clocks. But here's what the above snippet uses:

For Windows:

Wall Time: Performance Counters
CPU Time: GetProcessTimes()

For Linux:

Wall Time: gettimeofday()
CPU Time: clock()

And here's a small demonstration:

#include <math.h>
#include <iostream>
using namespace std;

int main(){

    //  Start Timers
    double wall0 = get_wall_time();
    double cpu0  = get_cpu_time();

    //  Perform some computation.
    double sum = 0;
#pragma omp parallel for reduction(+ : sum)
    for (long long i = 1; i < 10000000000; i++){
        sum += log((double)i);
    }

    //  Stop timers
    double wall1 = get_wall_time();
    double cpu1  = get_cpu_time();

    cout << "Wall Time = " << wall1 - wall0 << endl;
    cout << "CPU Time  = " << cpu1  - cpu0  << endl;

    //  Prevent Code Elimination
    cout << endl;
    cout << "Sum = " << sum << endl;

}

Output (12 threads):

Wall Time = 15.7586
CPU Time  = 178.719

Sum = 2.20259e+011

How to measure cpu time and wall clock time?

According to my manual page on clock it says

POSIX requires that CLOCKS_PER_SEC equals 1000000 independent of the actual resolution.

When increasing the number iterations on my computer the measured cpu-time starts showing on 100000 iterations. From the returned figures it seems the resolution is actually 10 millisecond.

Beware that when you optimize your code, the whole loop may disappear because sum is a dead value. There is also nothing to stop the compiler from moving the clock statements across the loop as there are no real dependences with the code in between.

Let me elaborate a bit more on micro measurements of performance of code. The naive and tempting way to measure performance is indeed by adding clock statements as you have done. However since time is not a concept or side effect in C, compilers can often move these clock calls at will. To remedy this it is tempting to make such clock calls have side effects by for example having it access volatile variables. However this still doesn't prohibit the compiler from moving highly side-effect free code over the calls. Think for example of accessing regular local variables. But worse, by making the clock calls look very scary to the compiler, you will actually negatively impact any optimizations. As a result, mere measuring of the performance impacts that performance in a negative and undesirable way.

If you use profiling, as already mentioned by someone, you can get a pretty good assessment of the performance of even optimized code, although the overall time of course is increased.

Another good way to measure performance is just asking the compiler to report the number of cycles some code will take. For a lot of architectures the compiler has a very accurate estimate of this. However most notably for a Pentium architecture it doesn't because the hardware does a lot of scheduling that is hard to predict.

Although it is not standing practice I think compilers should support a pragma that marks a function to be measured. The compiler then can include high precision non-intrusive measuring points in the prologue and epilogue of a function and prohibit any inlining of the function. Depending on the architecture it can choose a high precision clock to measure time, preferably with support from the OS to only measure time of the current process.

Measure CPU time and wall clock time of a program in C++

From the documentation:

std::clock time may advance faster or slower than the wall clock, depending on the execution resources given to the program by the operating system.

What specifically are wall-clock-time, user-cpu-time, and system-cpu-time in UNIX?

Wall-clock time is the time that a clock on the wall (or a stopwatch in hand) would measure as having elapsed between the start of the process and 'now'.

The user-cpu time and system-cpu time are pretty much as you said - the amount of time spent in user code and the amount of time spent in kernel code.

The units are seconds (and subseconds, which might be microseconds or nanoseconds).

The wall-clock time is not the number of seconds that the process has spent on the CPU; it is the elapsed time, including time spent waiting for its turn on the CPU (while other processes get to run).

Is cpu clock time returned by have to be exactly same among runs?

Of course we assume that our program is stable, and doesn't have any
random behaviour. So, is it possible?

If you're program is running on a desktop, this variability is typical, and I would say unavoidable. Interrupts, i/o channel activity, and Ethernet itself consume cpu time, often with surprisingly large 'blocks-of-time' (see tcp / ip SAR, cache misses, etc), most of which is beyond your program's control and not in-synch with your timing efforts.

I have seen only one example of software running in a 'stable' way that you hint at. That computer was a SBC (single board computer), with 1 cpu (not Intel or AMD), all static ram (so no dynamic ram, and no refresh activity), no Ethernet, but two channels of i/o at fixed rate, and it ran a single program on a scaled down op system (not linux, not a desktop os) ... the precision was as if the behaviour was simple hw logic.

As team lead, I recognized the unusual, so I asked her if she had time to attach a logic analyzer and scope ... she demonstrated that neither tool showed any variance in time, edge to edge, message to message. Her software logic was, to me, impressively straight forward. In that system, if you did not need an interrupt, you simply did not enable it.

A desktop is a remarkably different beast ... so many things going on at the same time, most of which can not be stifled.

Yes. It is not only possible but unavoidable that a desktop has the kinds of variance (in timing) you are seeing.

And yet it is possible to achieve the stability you have hinted at, just not on a desktop. It takes special hardware, and careful coding.

How does the clock function work behind the scene in a multithreading program?

Since the clock function only records the time consumption of each individual CPU that executes the current program. Meaning in a
multi-threaded program, the returned value would be lower than the
wall-clock time.

I can't find a definitive statement in a C Standard but, according to cppreference (which is generally very reliable), your assumption is wrong and the clock() function returns the total (for all CPUs) processor time used by the program (bold emphasis mine):

… For example, if the CPU is shared by other processes, clock
time may advance slower than wall clock. On the other hand, if the
current process is multithreaded and more than one execution core is
available, clock time may advance faster than wall clock.

How to capture wall-clock time and CPU time in a Python variable using bash builtin 'time'?

From Python Documentation for wall time:

... On Windows, time.clock() has microsecond granularity, but time.time()’s granularity is 1/60th of a second. On Unix, time.clock() has 1/100th of a second granularity, and time.time() is much more precise. On either platform, default_timer() measures wall clock time, not the CPU time. This means that other processes running on the same computer may interfere with the timing.

For wall time you can use timeit.default_timer() which gets the timer with best granularity described above.

From Python 3.3 and above you can use time.process_time() or time.process_time_ns() . Below is the documentation entry for process_time method:

Return the value (in fractional seconds) of the sum of the system and user CPU time of the current process. It does not include time elapsed during sleep. It is process-wide by definition. The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid.