Inlining Failed in Call to Always_Inline '_M256D _Mm256_Broadcast_Sd(Const Double*)'

error: inlining failed to call always_inline

GCC will only let you use intrinsics for instruction sets that are enabled for the compiler to use. e.g. a related question about an AVX1 intrinsic: inlining failed in call to always_inline '__m256d _mm256_broadcast_sd(const double*)'

These are _mask_ versions of 256-bit intrinsics, they require AVX512VL.

(My comments under the question about -mavx were wrong, I didn't notice the _mask in the name or args, just the _mm256.)

You're probably compiling on KNL (Knight's Landing / Xeon Phi) on your server, which has AVX512F but not AVX512VL. So -march=native will set -mavx512f. (Unlike Skylake-AVX512 which does have AVX512VL allowing use of cool new AVX512 stuff like masked instructions with narrower vectors.)

And you've found a bug in your tensor.hpp, where you use AVX512VL intrinsics after only checking for __AVX512F__ instead of __AVX512VL__. AVX512-anything implies 512F, so it doesn't need to check both.

#ifdef __AVX512F__    // should be __AVX512VL__
Tensor<T> Tensor::addAVX512(_param_){
   res = _mm256_mask_add_pd(tmp, 0xFF, _mm256_mask_loadu_pd(tmp, 0xFF, &elements[i]), _mm256_mask_loadu_pd(tmp, 0xFF, &a.elements[i]));
}
#endif

This is just pointless, you don't need to use the masked versions of these intrinsics if you're going to use constant all-ones masks. Use _mm256_add_pd like a normal person and only check for __AVX__. Or use _mm512_add_pd.

I thought at first this was from TensorFlow, but (from your comments) that doesn't make sense. And it can't be that badly written. Merge-masking into 3 copies of the same tmp with an all-true mask just makes no sense; it looks like a silly way to introduce a false dependency if the compiler can't optimize away the mask=all-ones into an unmasked load.

And also terrible C++ style: you have a variable called __m256d tmp as a global or class member?? It's not even a local dummy variable, it may exist somewhere the compiler can't fully optimize it away.

gcc compilation error: inlining failed in call to always_inline even after setting cflags environment variable

The inlining failure reason target specific option mismatch means that inlining failed because the program calls an always-inline function with a specific target attribute from another function which does not support this target. This is really something that is not supportable: the compiler cannot both compile a function to use certain micro-architecture features (the always-inline function) and not use them (the function into which is inlined).

In this particular case, the cause seems to be that the DPDK sources use compiler intrinsics, but you do not compile with the necessary subtarget option. You may have set the CFLAGS variable, but it does not seem to have any affect on the compilation (it is not part of the command line you quoted). Furthermore, tmmintrin.h needs -mssse3, not -msse4.1. The DPDK makefiles should take care of all these details.

Inlining Failed in Call to Always_Inline '_M256D _Mm256_Broadcast_Sd(Const Double*)'

error: inlining failed to call always_inline

gcc compilation error: inlining failed in call to always_inline even after setting cflags environment variable

Related Topics

Leave a reply