C++ Error: '_Mm_Sin_Ps' Was Not Declared in This Scope

C++ error: ‘_mm_sin_ps’ was not declared in this scope

_mm_sin_ps is part of the SVML library, shipped with intel compilers only. GCC developers focused on wrapping machine instructions and simple tasks, so there's no SVML in immintrin.h so far.

You have to use a library or write it by yourself.
Sinus implementation:

  • Taylor series
  • Quadratic curve

What is a lambda expression in C++11?

The problem

C++ includes useful generic functions like std::for_each and std::transform, which can be very handy. Unfortunately they can also be quite cumbersome to use, particularly if the functor you would like to apply is unique to the particular function.

#include <algorithm>
#include <vector>

namespace {
struct f {
void operator()(int) {
// do something

void func(std::vector<int>& v) {
f f;
std::for_each(v.begin(), v.end(), f);

If you only use f once and in that specific place it seems overkill to be writing a whole class just to do something trivial and one off.

In C++03 you might be tempted to write something like the following, to keep the functor local:

void func2(std::vector<int>& v) {
struct {
void operator()(int) {
// do something
} f;
std::for_each(v.begin(), v.end(), f);

however this is not allowed, f cannot be passed to a template function in C++03.

The new solution

C++11 introduces lambdas allow you to write an inline, anonymous functor to replace the struct f. For small simple examples this can be cleaner to read (it keeps everything in one place) and potentially simpler to maintain, for example in the simplest form:

void func3(std::vector<int>& v) {
std::for_each(v.begin(), v.end(), [](int) { /* do something here*/ });

Lambda functions are just syntactic sugar for anonymous functors.

Return types

In simple cases the return type of the lambda is deduced for you, e.g.:

void func4(std::vector<double>& v) {
std::transform(v.begin(), v.end(), v.begin(),
[](double d) { return d < 0.00001 ? 0 : d; }

however when you start to write more complex lambdas you will quickly encounter cases where the return type cannot be deduced by the compiler, e.g.:

void func4(std::vector<double>& v) {
std::transform(v.begin(), v.end(), v.begin(),
[](double d) {
if (d < 0.0001) {
return 0;
} else {
return d;

To resolve this you are allowed to explicitly specify a return type for a lambda function, using -> T:

void func4(std::vector<double>& v) {
std::transform(v.begin(), v.end(), v.begin(),
[](double d) -> double {
if (d < 0.0001) {
return 0;
} else {
return d;

"Capturing" variables

So far we've not used anything other than what was passed to the lambda within it, but we can also use other variables, within the lambda. If you want to access other variables you can use the capture clause (the [] of the expression), which has so far been unused in these examples, e.g.:

void func5(std::vector<double>& v, const double& epsilon) {
std::transform(v.begin(), v.end(), v.begin(),
[epsilon](double d) -> double {
if (d < epsilon) {
return 0;
} else {
return d;

You can capture by both reference and value, which you can specify using & and = respectively:

  • [&epsilon, zeta] captures epsilon by reference and zeta by value
  • [&] captures all variables used in the lambda by reference
  • [=] captures all variables used in the lambda by value
  • [&, epsilon] captures all variables used in the lambda by reference but captures epsilon by value
  • [=, &epsilon] captures all variables used in the lambda by value but captures epsilon by reference

The generated operator() is const by default, with the implication that captures will be const when you access them by default. This has the effect that each call with the same input would produce the same result, however you can mark the lambda as mutable to request that the operator() that is produced is not const.

Where is Clang's '_mm256_pow_ps' intrinsic?

That's not an intrinsic; it's an Intel SVML library function name that confusingly uses the same naming scheme as actual intrinsics. There's no vpowps instruction. (AVX512ER on Xeon Phi does have the semi-related vexp2ps instruction...)

IDK if this naming scheme is to trick people into depending on Intel tools when writing SIMD code with their compiler (which comes with SVML), or because their compiler does treat it like an intrinsic/builtin for doing constant propagation if inputs are known, or some other reason.

For functions like that and _mm_sin_ps to be usable, you need Intel's Short Vector Math Library (SVML). Most people just avoid using them. If it has an implementation of something you want, though, it's worth looking into. IDK what other vector pow implementations exist.

In the intrinsics finder, you can avoid seeing these non-portable functions in your search results if you leave the SVML box unchecked.

There are some "composite" intrinsics like _mm_set_epi8() that typically compile to multiple loads and shuffles which are portable across compilers, and do inline instead of being calls to library functions.

Also note that sqrtps is a native machine instruction, so _mm_sqrt_ps() is a real intrinsic. IEEE 754 specifies mul, div, add, sub, and sqrt as "basic" operations that are requires to produce correctly-rounded results (error <= 0.5ulp), so sqrt() is special and does have direct hardware support, unlike most other "math library" functions.

There are various libraries of SIMD math functions. Some of them come with C++ wrapper libraries that allow a+b instead of _mm_add_ps(a,b).

  • glibc libmvec - since glibc 2.22, to support OpenMP 4.0 vector math functions. GCC knows how to auto-vectorize some functions like cos(), sin(), and probably pow() using it. This answer shows one inconvenient way of using it explicitly for manual vectorization. (Hopefully better ways are possible that don't have mangled names in the source code).

  • Agner Fog's VCL has some math functions like exp and log. (Formerly GPL licensed, now Apache).

  • https://github.com/microsoft/DirectXMath (MIT license) - I think portable to non-Windows, and doesn't require DirectX.
  • https://sleef.org/ - apparently great performance, with variable accuracy you can choose. Formerly only supported on MSVC on Windows, the support matrix on its web site now includes GCC and Clang for x86-64 GNU/Linux and AArch64.

  • Intel's own SVML (comes with ICC; ICC auto-vectorizes with SVML by default). Confusingly has its prototypes in immintrin.h along with actual intrinsics. Maybe they want to trick people into writing code that's dependent on Intel tools/libraries. Or maybe they think fewer includes are better and that everyone should use their compiler...

    Also related: Intel MKL (Math Kernel Library), with matrix BLAS functions.

  • AMD ACML - end-of-life closed-source freeware. I think it just has functions that loop over arrays/matrices (like Intel MKL), not functions for single SIMD vectors.

  • sse_mathfun (zlib license) SSE2 and ARM NEON. Hasn't been updated since about 2011 it seems. But does have implementations of single-vector math / trig functions.

Passing 2-D array as argument

  void myFunction(int arr[][4])

you can put any number in the first [] but the compiler will ignore it. When passing a vector as parameter you must specify all dimensions but the first one.

Related Topics

Leave a reply
