error: inlining failed to call always_inline
GCC will only let you use intrinsics for instruction sets that are enabled for the compiler to use. e.g. a related question about an AVX1 intrinsic: inlining failed in call to always_inline '__m256d _mm256_broadcast_sd(const double*)'
These are _mask_
versions of 256-bit intrinsics, they require AVX512VL.
(My comments under the question about -mavx
were wrong, I didn't notice the _mask
in the name or args, just the _mm256
.)
You're probably compiling on KNL (Knight's Landing / Xeon Phi) on your server, which has AVX512F but not AVX512VL. So -march=native
will set -mavx512f
. (Unlike Skylake-AVX512 which does have AVX512VL allowing use of cool new AVX512 stuff like masked instructions with narrower vectors.)
And you've found a bug in your tensor.hpp
, where you use AVX512VL intrinsics after only checking for __AVX512F__
instead of __AVX512VL__
. AVX512-anything implies 512F, so it doesn't need to check both.
#ifdef __AVX512F__ // should be __AVX512VL__
Tensor<T> Tensor::addAVX512(_param_){
res = _mm256_mask_add_pd(tmp, 0xFF, _mm256_mask_loadu_pd(tmp, 0xFF, &elements[i]), _mm256_mask_loadu_pd(tmp, 0xFF, &a.elements[i]));
}
#endif
This is just pointless, you don't need to use the masked versions of these intrinsics if you're going to use constant all-ones masks. Use _mm256_add_pd
like a normal person and only check for __AVX__
. Or use _mm512_add_pd
.
I thought at first this was from TensorFlow, but (from your comments) that doesn't make sense. And it can't be that badly written. Merge-masking into 3 copies of the same tmp
with an all-true mask just makes no sense; it looks like a silly way to introduce a false dependency if the compiler can't optimize away the mask=all-ones into an unmasked load.
And also terrible C++ style: you have a variable called __m256d tmp
as a global or class member?? It's not even a local dummy variable, it may exist somewhere the compiler can't fully optimize it away.
gcc compilation error: inlining failed in call to always_inline even after setting cflags environment variable
The inlining failure reason target specific option mismatch means that inlining failed because the program calls an always-inline function with a specific target attribute from another function which does not support this target. This is really something that is not supportable: the compiler cannot both compile a function to use certain micro-architecture features (the always-inline function) and not use them (the function into which is inlined).
In this particular case, the cause seems to be that the DPDK sources use compiler intrinsics, but you do not compile with the necessary subtarget option. You may have set the CFLAGS
variable, but it does not seem to have any affect on the compilation (it is not part of the command line you quoted). Furthermore, tmmintrin.h
needs -mssse3
, not -msse4.1
. The DPDK makefiles should take care of all these details.
Related Topics
How to Specify Setprecision Rounding
How to Use Doxygen to Create Uml Class Diagrams from C++ Source
Fast Fixed Point Pow, Log, Exp and Sqrt
Are C++ Templates Just MACros in Disguise
What Is Going on with 'Gets(Stdin)' on the Site Coderbyte
Why Are By-Value Parameters Excluded from Nrvo
Preincrement Faster Than Postincrement in C++ - True? If Yes, Why Is It
Keyboard Input & the Win32 Message Loop
Force All Classes to Implement/Override a 'Pure Virtual' Method in Multi-Level Inheritance Hierarchy
Hash an Arbitrary Precision Value (Boost::Multiprecision::Cpp_Int)
Gcc Linker Can't Find Standard Library
In C++, Is It Still Bad Practice to Return a Vector from a Function
When Do I Really Need to Use Atomic<Bool> Instead of Bool
How Much Footprint Does C++ Exception Handling Add
Pure Virtual Functions May Not Have an Inline Definition. Why
Find All Substring's Occurrences and Locations