Reducing on Array in Openmp

Reducing on array in OpenMP

Yes it is possible to do an array reduction with OpenMP. In Fortran it even has construct for this. In C/C++ you have to do it yourself. Here are two ways to do it.

The first method makes private version of S for each thread, fill them in parallel, and then merges them into S in a critical section (see the code below). The second method makes an array with dimentions 10*nthreads. Fills this array in parallel and then merges it into S without using a critical section. The second method is much more complicated and can have cache issues especially on multi-socket systems if you are not careful. For more details see this Fill histograms (array reduction) in parallel with OpenMP without using a critical section

First method

int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
int S [10] = {0};
#pragma omp parallel
{
    int S_private[10] = {0};
    #pragma omp for
    for (int n=0 ; n<10 ; ++n ) {
        for (int m=0; m<=n; ++m){
            S_private[n] += A[m];
        }
    }
    #pragma omp critical
    {
        for(int n=0; n<10; ++n) {
            S[n] += S_private[n];
        }
    }
}

Second method

int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
int S [10] = {0};
int *S_private;
#pragma omp parallel
{
    const int nthreads = omp_get_num_threads();
    const int ithread = omp_get_thread_num();

    #pragma omp single 
    {
        S_private = new int[10*nthreads];
        for(int i=0; i<(10*nthreads); i++) S_private[i] = 0;
    }
    #pragma omp for
    for (int n=0 ; n<10 ; ++n )
    {
        for (int m=0; m<=n; ++m){
            S_private[ithread*10+n] += A[m];
        }
    }
    #pragma omp for
    for(int i=0; i<10; i++) {
        for(int t=0; t<nthreads; t++) {
            S[i] += S_private[10*t + i];
        }
    }
}
delete[] S_private;

Is it possible to do a reduction on an array with openmp?

Only in Fortran in OpenMP 3.0, and probably only with certain compilers.

See the last example (Example 3) on:

http://wikis.sun.com/display/openmp/Fortran+Allocatable+Arrays

OpenMP in C array reduction / parallelize the code

First you have a typo on the name

#pragma omp parallel for reduction (+: reasult[:i])

should actually be "result" not "reasult"

Nonetheless, why are you section the array with result[:i]? Based on your code, it seems that you wanted to reduce the entire array, namely:

#pragma omp parallel for reduction (+: result)
    for (i = 0; i < M; i++)   
      for(j = 0; j < N; j++)
         if ( numbers[j] == i)
            result[i]++;

When one's compiler does not support the OpenMP 4.5 array reduction feature one can alternatively explicitly implement the reduction (check this SO thread to see how).

As pointed out by @Hristo Iliev in the comments:

Provided that M * sizeof(result[0]) / #threads is a multiple of the
cache line size, and even if it isn't when the value of M is large
enough, there is absolutely no need to involve reduction in the
process. Unless the program is running on a NUMA system, that is.

Assuming that the aforementioned conditions are met, and if you analyze carefully the outermost loop iterations (i.e., variable i) are assigned to the threads, and since the variable i is used to access the result array, each thread will be updating a different position of the result array. Therefore, you can simplified your code to:

#pragma omp parallel for
for (i = 0; i < M; i++)   
   for(j = 0; j < N; j++)
      if ( numbers[j] == i)
         result[i]++;

OpenMP reduce array without thread-local copies

You could use OpenMPs atomic update construct, which is already present in OpenMP 3.1:

void myfunc(double* small, double* large, int lLen, // other args) {

    #pragma omp parallel for
    for (int i=0; i<lLen; i++) {
    
        int sInd = // determined by other args
        #pragma omp atomic update
        small[sInd] += large[i];
    }
}

This should be faster than using locks.

OpenMP: Is array reduction always needed for updating an array in parallel?

You don't need a reduction. It is a feature to avoid copying the same code all over again because they are re-occurring problems (Try to think off, how you would implement a sum-reduction without OpenMP).

What you do right now is working on parallel data (V[i]) which should not overlap at any iteration (as you state in the question), because you divide by i itself. Furthermore write to F[...] shouldn't overlap either, because it only depends on iand k

OpenMP SIMD reduction in array: error: reduction variable must be shared on entry to this OpenMP pragma

The problem is that GCC 7.3.0 does not support

#pragma omp for simd collapse(4) reduction(+:next[0:width*height])

the use of reduction of array sections in this context.

This function is support by GCC 9 forwards:

Since GCC 9, there is initial OpenMP 5 support (essentially C/C++,
only). GCC 10 added some more features, mainly for C/C++ but also for
Fortran.

Can I use OpenMP reduction when the variable is an array element?

You're mistaking the error you're getting. It doesn't mean that you can't do the reduction on alternating elements. It means that you can't do the reduction into an element of an array.

That is, you can't do reduction(+:sum[0]). But you can do the reduction into another scalar variable and then copy into an element of an array afterwards:

void sum_int(const int input[], int num, int *sum)
{
    int sum_even = 0, sum_odd = 0;

    #pragma omp parallel for reduction(+:sum_even) reduction(+:sum_odd)
    for (int i = 0; i < num; i++) {
            if (i % 2 == 0)
                    sum_even += input[i];
            else
                    sum_odd += input[i];
    }

    sum[0] = sum_even;
    sum[1] = sum_odd;
}

Reducing on Array in Openmp