C++ error: ‘_mm_sin_ps’ was not declared in this scope
_mm_sin_ps
is part of the SVML library, shipped with intel compilers only. GCC developers focused on wrapping machine instructions and simple tasks, so there's no SVML in immintrin.h
so far.
You have to use a library or write it by yourself.
Sinus implementation:
- Taylor series
- CORDIC
- Quadratic curve
What is a lambda expression in C++11?
The problem
C++ includes useful generic functions like std::for_each
and std::transform
, which can be very handy. Unfortunately they can also be quite cumbersome to use, particularly if the functor you would like to apply is unique to the particular function.
#include <algorithm>
#include <vector>
namespace {
struct f {
void operator()(int) {
// do something
}
};
}
void func(std::vector<int>& v) {
f f;
std::for_each(v.begin(), v.end(), f);
}
If you only use f
once and in that specific place it seems overkill to be writing a whole class just to do something trivial and one off.
In C++03 you might be tempted to write something like the following, to keep the functor local:
void func2(std::vector<int>& v) {
struct {
void operator()(int) {
// do something
}
} f;
std::for_each(v.begin(), v.end(), f);
}
however this is not allowed, f
cannot be passed to a template function in C++03.
The new solution
C++11 introduces lambdas allow you to write an inline, anonymous functor to replace the struct f
. For small simple examples this can be cleaner to read (it keeps everything in one place) and potentially simpler to maintain, for example in the simplest form:
void func3(std::vector<int>& v) {
std::for_each(v.begin(), v.end(), [](int) { /* do something here*/ });
}
Lambda functions are just syntactic sugar for anonymous functors.
Return types
In simple cases the return type of the lambda is deduced for you, e.g.:
void func4(std::vector<double>& v) {
std::transform(v.begin(), v.end(), v.begin(),
[](double d) { return d < 0.00001 ? 0 : d; }
);
}
however when you start to write more complex lambdas you will quickly encounter cases where the return type cannot be deduced by the compiler, e.g.:
void func4(std::vector<double>& v) {
std::transform(v.begin(), v.end(), v.begin(),
[](double d) {
if (d < 0.0001) {
return 0;
} else {
return d;
}
});
}
To resolve this you are allowed to explicitly specify a return type for a lambda function, using -> T
:
void func4(std::vector<double>& v) {
std::transform(v.begin(), v.end(), v.begin(),
[](double d) -> double {
if (d < 0.0001) {
return 0;
} else {
return d;
}
});
}
"Capturing" variables
So far we've not used anything other than what was passed to the lambda within it, but we can also use other variables, within the lambda. If you want to access other variables you can use the capture clause (the []
of the expression), which has so far been unused in these examples, e.g.:
void func5(std::vector<double>& v, const double& epsilon) {
std::transform(v.begin(), v.end(), v.begin(),
[epsilon](double d) -> double {
if (d < epsilon) {
return 0;
} else {
return d;
}
});
}
You can capture by both reference and value, which you can specify using &
and =
respectively:
[&epsilon, zeta]
captures epsilon by reference and zeta by value[&]
captures all variables used in the lambda by reference[=]
captures all variables used in the lambda by value[&, epsilon]
captures all variables used in the lambda by reference but captures epsilon by value[=, &epsilon]
captures all variables used in the lambda by value but captures epsilon by reference
The generated operator()
is const
by default, with the implication that captures will be const
when you access them by default. This has the effect that each call with the same input would produce the same result, however you can mark the lambda as mutable
to request that the operator()
that is produced is not const
.
Where is Clang's '_mm256_pow_ps' intrinsic?
That's not an intrinsic; it's an Intel SVML library function name that confusingly uses the same naming scheme as actual intrinsics. There's no vpowps
instruction. (AVX512ER on Xeon Phi does have the semi-related vexp2ps
instruction...)
IDK if this naming scheme is to trick people into depending on Intel tools when writing SIMD code with their compiler (which comes with SVML), or because their compiler does treat it like an intrinsic/builtin for doing constant propagation if inputs are known, or some other reason.
For functions like that and _mm_sin_ps
to be usable, you need Intel's Short Vector Math Library (SVML). Most people just avoid using them. If it has an implementation of something you want, though, it's worth looking into. IDK what other vector pow
implementations exist.
In the intrinsics finder, you can avoid seeing these non-portable functions in your search results if you leave the SVML
box unchecked.
There are some "composite" intrinsics like _mm_set_epi8()
that typically compile to multiple loads and shuffles which are portable across compilers, and do inline instead of being calls to library functions.
Also note that sqrtps
is a native machine instruction, so _mm_sqrt_ps()
is a real intrinsic. IEEE 754 specifies mul, div, add, sub, and sqrt as "basic" operations that are requires to produce correctly-rounded results (error <= 0.5ulp), so sqrt()
is special and does have direct hardware support, unlike most other "math library" functions.
There are various libraries of SIMD math functions. Some of them come with C++ wrapper libraries that allow a+b
instead of _mm_add_ps(a,b)
.
glibc libmvec - since glibc 2.22, to support OpenMP 4.0 vector math functions. GCC knows how to auto-vectorize some functions like
cos()
,sin()
, and probablypow()
using it. This answer shows one inconvenient way of using it explicitly for manual vectorization. (Hopefully better ways are possible that don't have mangled names in the source code).Agner Fog's VCL has some math functions like
exp
andlog
. (Formerly GPL licensed, now Apache).- https://github.com/microsoft/DirectXMath (MIT license) - I think portable to non-Windows, and doesn't require DirectX.
https://sleef.org/ - apparently great performance, with variable accuracy you can choose. Formerly only supported on MSVC on Windows, the support matrix on its web site now includes GCC and Clang for x86-64 GNU/Linux and AArch64.
Intel's own SVML (comes with ICC; ICC auto-vectorizes with SVML by default). Confusingly has its prototypes in
immintrin.h
along with actual intrinsics. Maybe they want to trick people into writing code that's dependent on Intel tools/libraries. Or maybe they think fewer includes are better and that everyone should use their compiler...Also related: Intel MKL (Math Kernel Library), with matrix BLAS functions.
AMD ACML - end-of-life closed-source freeware. I think it just has functions that loop over arrays/matrices (like Intel MKL), not functions for single SIMD vectors.
sse_mathfun (zlib license) SSE2 and ARM NEON. Hasn't been updated since about 2011 it seems. But does have implementations of single-vector math / trig functions.
Passing 2-D array as argument
void myFunction(int arr[][4])
you can put any number in the first [] but the compiler will ignore it. When passing a vector as parameter you must specify all dimensions but the first one.
Related Topics
How to Access Private Data Members Outside the Class Without Making "Friend"S
C++11: Compile-Time Array with Logarithmic Evaluation Depth
How to Automatically Register a Class on Creation
Fatal Error: "No Target Architecture" in Visual Studio
Visual Studio 2017 Errors on Standard Headers
Std::Unique_Ptr for C Functions That Need Free
What Are Unevaluated Contexts in C++
How to Enable C++17 Support in VScode C++ Extension
When Is a Function Try Block Useful
Passing Functor as Function Pointer
Error: Could Not Resolve Sdk Path for 'Macosx10.8'
Class Template Argument Deduction Not Working with Alias Template
Overloading Base Class Method in Derived Class
How to Use Sfinae for Selecting Constructors