How to Analyze Program Running Time

How to analyze program running time

As Öö Tiib suggested, just break the program in a debugger. The way I do it is get the program running, switch to the output window, type Ctrl-C to interrupt the program, switch back to the GDB window, type "thread 1" so as to be in the context of the main program, and type "bt" to see the stack trace.

Now, look at the stack trace and understand it, because while the instruction at the program counter is responsible for that particular cycle being spent, so is every call on the stack.

If you do this a few times, you're going to see exactly what line is responsible for the bottleneck.
As soon as you see it on two (2) samples, you've nailed it.
Then fix it and do it all again, finding the next bottleneck, and so on.
You could easily find that you get enormous speedup this way.

< flame>

Some people say this is exactly what profilers do, only they do it better.
That's what you hear in lecture halls and on blogs, but here's the deal:
There are ways to speed up your code that do not reveal themselves as "slow functions" or "hot paths", for example - reorganizing the data structure.
Every function looks more-or-less innocent, even if it has high inclusive time percent.

They do reveal themselves if you actually look at stack samples.
So the problem with good profilers is not in the collection of samples, it is in the presentation of results. Statistics and measurements cannot tell you what a small selection of samples, examined carefully, do tell you.

What about the issue of small vs. large number of samples? Aren't more better?
OK, suppose you have an infinite loop, or if not infinite, it just runs far longer than you know it should? Would 1000 stack samples find it any better than a single sample? (No.) If you look at it under a debugger, you know you're in the loop because it takes basically 100% of the time. It's on the stack somewhere - just scan up the stack until you find it.
Even if the loop only takes 50% or 20% of the time, that's the probability each sample will see it.
So, if you see something you could get rid of on as few as two samples, it's worth doing it.
So, what do the 1000 samples buy you?

Maybe one thinks: "So what if we miss a problem or two? Maybe it's good enough." Well, is it?
Suppose the code has three problems P taking 50% of the time, Q taking 25%, and R taking 12.5%. The good stuff is called A.
This shows the speedup you get if you fix one of them, two of them, or all three of them.

PRPQPQPAPQPAPRPQ original time with avoidable code P, Q, and R all mixed together
RQQAQARQ         fix P           - 2 x   speedup
PRPPPAPPAPRP     fix Q           - 1.3 x    "
PPQPQPAPQPAPPQ   fix R           - 1.14 x   "
RAAR             fix P and Q     - 4 x      "
QQAQAQ           fix P and R     - 2.7 x    "
PPPPAPPAPP       fix Q and R     - 1.6 x    "
AA               fix P, Q, and R - 8 x   speedup

Does this make it clear why the ones that "get away" really hurt?
The best you can do if you miss any is twice as slow.

They are easy to find if you examine samples. P is on half the samples.
If you fix P and do it again, Q is on half the samples. Once you fix Q, R is on half the samples.
Fix R and you've got your 8x speedup.
You don't have to stop there. You can keep going until you truly can't find anything to fix.

The more problems there are, the higher the potential speedup,
but you can't afford to miss any.
The problem with profilers (even good ones) is that, by denying you the chance to see and study individual samples, they hide problems that you need to find.
More on all that.
For the statistically inclined, here's how it works.

There are good profilers.
The best are wall-time stack samplers that report inclusive percent at individual lines, letting you turn sampling on and off with a hot-key.
Zoom (wiki) is such a profiler.

But even those make the mistake of assuming you need lots of samples.
You don't, and the price you pay for them is you can't actually see any, so you can't see why the time is being spent, so you can't easily tell if it's necessary,
and you can't get rid of something unless you know you don't need it.
The result is you miss bottlenecks, and they end up stunting your speedup.

< /flame>

How do I analyze the running time of an algorithm in C?

Is there a function to use or a built in tool in IDEs?

Not that I'm aware of. Human brain required. :)

However, here are some tentatively-educational pointers (in addition to the very helpful comment of Eric Postpischil):

Running Time

To estimate running time you will likely start by using asymptotic notation - this is the *summarized estimate+ of how much the algorithm will take in best/worst/average cases.

Worst case scenario: Big O Notation
Best case scenario: Omega Notation
Average case scenario: Theta Notation

https://en.wikipedia.org/wiki/Big_O_notation
https://en.wikipedia.org/wiki/Time_complexity

If you're just starting, you're probably being asked the best/worst cases.

Quick side note for motivation: why would this be useful in real life? There's plenty of applications that require sorting, such as get the top chess players from Russia, or sort chess Grandmasters by their age. If your website gets thousands of views per second, doing these things fast and efficiently will result in a better experience for your users, as well as reduced costs for you.

It's good to start analysing what your algorithm is doing. For asymptotic analysis, the most important parts are loops (or function calls that themselves may contain loops/function calls, without forgetting about recursive calls…).

- Q1: What is your algorithm doing?

The for loops each appear to be going through N/2 elements (for an array of 10 elements, each loop runs 5 times). In Big-O notation, this would be O(N/2), but we don't care about constants, so it's correct to state O(N).

Their combined time complexity is O(N) + O(N) = 2*O(N) = O(N) (because, again, constants disappear). So, the time complexity of an iteration of the outer loop is O(N).

This leaves the outer loop - how many times will it run? This will be the main problem, and will depend on the array. The time complexity of your algorithm will be the number of times the outer loop runs multiplied by the time complexity of each of those iterations.

- Q2: How many times will the outer loop run?

For example, if the array is already sorted (e.g. [1, 2, 3]), then the outer loop will run just once. This is the best case scenario: Ω(N).

What is the worst possible array you can imagine? That's where human ingenuity comes to play. :-)

- Q3: Average case scenario?

Not sure if this still part of the scope of your assignment. It's more tricky, as average may vary from application to application. Usually at a learning level it's OK-ish to assume a uniform distribution (i.e. any input is as likely as any other).

Then you start to play with probabilities. What's the probability of getting an array that does 1 iteration of the outer loop? What about 2? What about N?

Space Complexity

This is a similar analysis, but instead of looking at the number of instructions, usually we look at the memory used (e.g. mallocs within loops).

This is an in-place algorithm, so not the most interesting… :)

Execution time of C program

CLOCKS_PER_SEC is a constant which is declared in <time.h>. To get the CPU time used by a task within a C application, use:

clock_t begin = clock();

/* here, do your time-consuming job */

clock_t end = clock();
double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;

Note that this returns the time as a floating point type. This can be more precise than a second (e.g. you measure 4.52 seconds). Precision depends on the architecture; on modern systems you easily get 10ms or lower, but on older Windows machines (from the Win98 era) it was closer to 60ms.

clock() is standard C; it works "everywhere". There are system-specific functions, such as getrusage() on Unix-like systems.

Java's System.currentTimeMillis() does not measure the same thing. It is a "wall clock": it can help you measure how much time it took for the program to execute, but it does not tell you how much CPU time was used. On a multitasking systems (i.e. all of them), these can be widely different.

Difference between Running time and Execution time in algorithm?

What is the real meaning or definition of running time and execution time? Are they the same of different?

The definition of "running time" in 'Introduction to Algorithms' by C,L,R,S [CLRS] is actually not a time, but a number of steps. This is not what you would intuitively use as a definition. Most would agree that "runnning" and "executing" are the same concept, and that "time" is expressed in a unit of time (like milliseconds). So while we would normally consider these two terms to have the same meaning, in CLRS they have deviated from that, and gave a different meaning to "running time".

Does running time describe the number of steps executed or not?

It does mean that in CLRS. But the definition that CLRS uses for "running time" is particular, and not the same as you might encounter in other resources.

CLRS assumes here that a primitive operation (i.e. a step) takes O(1) time.
This is typically true for CPU instructions, which take up to a fixed maximum number of cycles (where each cycle represents a unit of time), but it may not be true in higher level languages. For instance, some languages have a sort instruction. Counting that as a single "step" would give useless results in an analysis.

Breaking down an algorithm into its O(1) steps does help to analyse the complexity of an algorithm. Counting the steps for different inputs may only give a hint about the complexity though. Ultimately, the complexity of an algorithm requires a (mathematical) proof, based on the loops and the known complexity of the steps used in an algorithm.

Does running time depend on the computer or not?

Certainly the execution time may differ. This is one of the reasons we want to by a new computer once in a while.

The number of steps may depend on the computer. If both support the same programming language, and you count steps in that language, then: yes. But if you would do the counting more thoroughly and would count the CPU instructions that are actually ran by the compiled program, then it might be different. For instance, a C compiler on one computer may generate different machine code than a different C compiler on another computer, and so the number of CPU instructions may be less on the one than the other, even though they result from the same C program code.

Practically however, this counting at CPU instruction level is not relevant for determining the complexity of an algorithm. We generally know the time complexity of each instruction in the higher level language, and that is what counts for determining the overall complexity of an algorithm.

Measuring execution time of a function in C++

It is a very easy-to-use method in C++11. You have to use std::chrono::high_resolution_clock from <chrono> header.

Use it like so:

#include <chrono>

/* Only needed for the sake of this example. */
#include <iostream>
#include <thread>
    
void long_operation()
{
    /* Simulating a long, heavy operation. */

    using namespace std::chrono_literals;
    std::this_thread::sleep_for(150ms);
}

int main()
{
    using std::chrono::high_resolution_clock;
    using std::chrono::duration_cast;
    using std::chrono::duration;
    using std::chrono::milliseconds;

    auto t1 = high_resolution_clock::now();
    long_operation();
    auto t2 = high_resolution_clock::now();

    /* Getting number of milliseconds as an integer. */
    auto ms_int = duration_cast<milliseconds>(t2 - t1);

    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_double = t2 - t1;

    std::cout << ms_int.count() << "ms\n";
    std::cout << ms_double.count() << "ms\n";
    return 0;
}

This will measure the duration of the function long_operation.

Possible output:

150ms
150.068ms

Working example: https://godbolt.org/z/oe5cMd

how to measure running time of algorithms in python

I am not 100% sure what is meant by "running times of my algorithms written in python", so I thought I might try to offer a broader look at some of the potential answers.

Algorithms don't have running times; implementations can be timed, but an algorithm is an abstract approach to doing something. The most common and often the most valuable part of optimizing a program is analyzing the algorithm, usually using asymptotic analysis and computing the big O complexity in time, space, disk use and so forth.
A computer cannot really do this step for you. This requires doing the math to figure out how something works. Optimizing this side of things is the main component to having scalable performance.
You can time your specific implementation. The nicest way to do this in Python is to use timeit. The way it seems most to want to be used is to make a module with a function encapsulating what you want to call and call it from the command line with python -m timeit ....
Using timeit to compare multiple snippets when doing microoptimization, but often isn't the correct tool you want for comparing two different algorithms. It is common that what you want is asymptotic analysis, but it's possible you want more complicated types of analysis.
You have to know what to time. Most snippets aren't worth improving. You need to make changes where they actually count, especially when you're doing micro-optimisation and not improving the asymptotic complexity of your algorithm.
If you quadruple the speed of a function in which your code spends 1% of the time, that's not a real speedup. If you make a 20% speed increase on a function in which your program spends 50% of the time, you have a real gain.
To determine the time spent by a real Python program, use the stdlib profiling utilities. This will tell you where in an example program your code is spending its time.

How to Analyze Program Running Time