Parallel Loops in C++

How can I execute two for loops in parallel in C++?

As already mentioned in the comments that OpenMP may not be the best solution to do so, but if you wish to do it with OpenMP, I suggest the following:

Use sections to start 2 threads, and communicate between the threads by using shared variables. The important thing is to use atomic operation to read (#pragma omp atomic read seq_cst) and to write (#pragma omp atomic write seq_cst) these variables. Here is an example:

#pragma omp parallel num_threads(2)
#pragma omp sections
{
#pragma omp section
{
//This is the sensor controlling part

while(exit_condition)
{
sensor_state = read_sensor();

// Read the currect state of motor from other thread
#pragma omp atomic read seq_cst
motor_state=shared_motor_state;

// Based on the motor_state and sensor state send
// a command to the other thread to control the motor
// or wait for the motor to be ready in a loop, etc.

#pragma omp atomic write seq_cst
shared_motor_command= //whaterver you wish ;
}
}

#pragma omp section
{
//This is the motor controlling part
while(exit_condition)
{
// read motor command form other thread
#pragma omp atomic read seq_cst
motor_command = shared_motor_command;

// Do whatewer you have to to based on motor command and
// You can set the state of motor by the following line

#pragma omp atomic write seq_cst
shared_motor_state= //what you need to pass to the other thread

}
}
}

Parallel Loops in C++

With the parallel algorithms in C++17 we can now use:

std::vector<std::string> foo;
std::for_each(
std::execution::par,
foo.begin(),
foo.end(),
[](auto&& item)
{
//do stuff with item
});

to compute loops in parallel. The first parameter specifies the execution policy

What is a parallel for loop, and how/when should it be used?

What are parallel for loops, and how do they work?

A parallel for loop is a for loop in which the statements in the loop can be run in parallel: on separate cores, processors or threads.

Let us take a summing code:

unsigned int numbers[] = { 1, 2, 3, 4, 5, 6};
unsigned int sum = 0;
const unsigned int quantity = sizeof(numbers) / sizeof (numbers[0]);
for (unsigned int i = 0; i < quantity; ++i)
{
sum = sum + numbers[i];
};

Calculating a sum does not depend on the order. The sum only cares that all numbers have been added.

The loop could be split into two loops that are executed by separate threads or processors:

// Even loop:
unsigned int even_sum = 0;
for (unsigned int e = 0; e < quantity; e += 2)
{
even_sum += numbers[e];
}

// Odd summation loop:
unsigned int odd_sum = 0;
for (unsigned int odd = 1; odd < quantity; odd += 2)
{
odd_sum += numbers[odd];
}

// Create the sum
sum = even_sum + odd_sum;

The even and odd summing loops are independent of each other. They do not access any of the same memory locations.

The summing for loop can be considered as a parallel for loop because its statements can be run by separate processes in parallel, such as separate CPU cores.

Somebody else can supply a more detailed definition, but this is the general example.

Edit 1:

Can any for loop be made parallel?

No, not any loop can be made parallel. Iterations of the loop must be independent from each other. That is, one cpu core should be able to run one iteration without any side effects to another cpu core running a different iteration.

What are the use for them?

Performance?

In general, the reason is for performance. However, the overhead of setting up the loop must be less than the execution time of the iteration. Also, there is overhead of waiting for the parallel execution to finish and join the results together.

Usually data moving and matrix operations are good candidates for parallelism. For example, moving a bitmap or applying a transformation to the bitmap. Huge quantities of data need all the help they can get.

Other functionality?

Yes, there are other possible uses of parallel for loops, such as updating more than one hardware device at the same time. However, the general case is for improving data processing performance.

how to use parallelize two serial for loops such that the work of the two for loops are distributed over the thread

You can declare the loops nowait and move the reduction to the end of the parallel section. Something like this:

#   pragma omp parallel private(tid, prod) reduction(+: sum) 
{
# pragma omp for nowait
for (i = 0; i < 50; i++) {
prod = arr[i]+1;
sum += prod;
}
# pragma omp for nowait
for (i = 50; i < SIZE; i++) {
prod = arr[i]+1;
sum += prod;
}
}

How to use parallelise two independent for loops in openmp?

As long as N is much bigger than the number of physical cpu cores at your disposal, you should not parallelize the inner loop as well. In parallel lingo one would say that the outer loop provides enough parallelism, such that trying to parallelize the inner loop will in the best case add more overhead due to thread creation and in the worst case oversubscribe you system.

Only if regularly N is of the same order of magnitude as the numbber of cores you should start to think about further parallelizing that loop nest, which in this case may be far from trivial.

C OMP for loop in parallel region. Not work-shared

You're probabily trying to access an out of range position in the dynamic array b_local.

See that sizeof(b) will return the size in bytes of float* (size of a float pointer).

If you want to know the size of the array that you are passing to the function, i would suggest you add it to the parameters of the function.

void parallelCSC_SpMV(float *x, float *b, int b_size){
...
float* b_local = (float*) malloc(sizeof(float)*b_size);
...
}

And, if the size of colptrs is numcols i would be careful with colptrs[i+1], since when i=numcols-1 will have another out of range problem.



Related Topics



Leave a reply



Submit