Does the Restrict Keyword Provide Significant Benefits in Gcc/G++

Does the restrict keyword provide significant benefits in gcc/g++?

The restrict keyword does a difference.

I've seen improvements of factor 2 and more in some situations (image processing). Most of the time the difference is not that large though. About 10%.

Here is a little example that illustrate the difference. I've written a very basic 4x4 vector * matrix transform as a test. Note that I have to force the function not to be inlined. Otherwise GCC detects that there aren't any aliasing pointers in my benchmark code and restrict wouldn't make a difference due to inlining.

I could have moved the transform function to a different file as well.

#include <math.h>

#ifdef USE_RESTRICT
#else
#define __restrict
#endif

void transform (float * __restrict dest, float * __restrict src,
float * __restrict matrix, int n) __attribute__ ((noinline));

void transform (float * __restrict dest, float * __restrict src,
float * __restrict matrix, int n)
{
int i;

// simple transform loop.

// written with aliasing in mind. dest, src and matrix
// are potentially aliasing, so the compiler is forced to reload
// the values of matrix and src for each iteration.

for (i=0; i<n; i++)
{
dest[0] = src[0] * matrix[0] + src[1] * matrix[1] +
src[2] * matrix[2] + src[3] * matrix[3];

dest[1] = src[0] * matrix[4] + src[1] * matrix[5] +
src[2] * matrix[6] + src[3] * matrix[7];

dest[2] = src[0] * matrix[8] + src[1] * matrix[9] +
src[2] * matrix[10] + src[3] * matrix[11];

dest[3] = src[0] * matrix[12] + src[1] * matrix[13] +
src[2] * matrix[14] + src[3] * matrix[15];

src += 4;
dest += 4;
}
}

float srcdata[4*10000];
float dstdata[4*10000];

int main (int argc, char**args)
{
int i,j;
float matrix[16];

// init all source-data, so we don't get NANs
for (i=0; i<16; i++) matrix[i] = 1;
for (i=0; i<4*10000; i++) srcdata[i] = i;

// do a bunch of tests for benchmarking.
for (j=0; j<10000; j++)
transform (dstdata, srcdata, matrix, 10000);
}

Results: (on my 2 Ghz Core Duo)

nils@doofnase:~$ gcc -O3 test.c
nils@doofnase:~$ time ./a.out

real 0m2.517s
user 0m2.516s
sys 0m0.004s

nils@doofnase:~$ gcc -O3 -DUSE_RESTRICT test.c
nils@doofnase:~$ time ./a.out

real 0m2.034s
user 0m2.028s
sys 0m0.000s

Over the thumb 20% faster execution, on that system.

To show how much it depends on the architecture I've let the same code run on a Cortex-A8 embedded CPU (adjusted the loop count a bit cause I don't want to wait that long):

root@beagleboard:~# gcc -O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp test.c
root@beagleboard:~# time ./a.out

real 0m 7.64s
user 0m 7.62s
sys 0m 0.00s

root@beagleboard:~# gcc -O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -DUSE_RESTRICT test.c
root@beagleboard:~# time ./a.out

real 0m 7.00s
user 0m 6.98s
sys 0m 0.00s

Here the difference is just 9% (same compiler btw.)

Realistic usage of the C99 'restrict' keyword?

restrict says that the pointer is the only thing that accesses the underlying object. It eliminates the potential for pointer aliasing, enabling better optimization by the compiler.

For instance, suppose I have a machine with specialized instructions that can multiply vectors of numbers in memory, and I have the following code:

void MultiplyArrays(int* dest, int* src1, int* src2, int n)
{
for(int i = 0; i < n; i++)
{
dest[i] = src1[i]*src2[i];
}
}

The compiler needs to properly handle if dest, src1, and src2 overlap, meaning it must do one multiplication at a time, from start to the end. By having restrict, the compiler is free to optimize this code by using the vector instructions.

Wikipedia has an entry on restrict, with another example, here.



Related Topics



Leave a reply



Submit