Fast Implementation of Trigonometric Functions for C++

Fast implementation of trigonometric functions for c++

Here are some good slides on how to do power series approximations (NOT Taylor series though) of trig functions: Faster Math Functions.

It's geared towards game programmers, which means accuracy gets sacrificed for performance, but you should be able to add another term or two to the approximations to get some of the accuracy back.

The nice thing about this is that you should also be able to extend it to SIMD easily, so that you could compute the sin or cos of 4 values at one (2 if you're using double precision).

Hope that helps...

Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)

The fastest way is to pre-compute values an use a table like in this example:

Create sine lookup table in C++

BUT if you insist upon computing at runtime you can use the Taylor series expansion of sine or cosine...

Taylor Series of sine

For more on the Taylor series... http://en.wikipedia.org/wiki/Taylor_series

One of the keys to getting this to work well is pre-computing the factorials and truncating at a sensible number of terms. The factorials grow in the denominator very quickly, so you don't need to carry more than a few terms.

Also...Don't multiply your x^n from the start each time...e.g. multiply x^3 by x another two times, then that by another two to compute the exponents.

How does C compute sin() and other math functions?

In GNU libm, the implementation of sin is system-dependent. Therefore you can find the implementation, for each platform, somewhere in the appropriate subdirectory of sysdeps.

One directory includes an implementation in C, contributed by IBM. Since October 2011, this is the code that actually runs when you call sin() on a typical x86-64 Linux system. It is apparently faster than the fsin assembly instruction. Source code: sysdeps/ieee754/dbl-64/s_sin.c, look for __sin (double x).

This code is very complex. No one software algorithm is as fast as possible and also accurate over the whole range of x values, so the library implements several different algorithms, and its first job is to look at x and decide which algorithm to use.

When x is very very close to 0, sin(x) == x is the right answer.
A bit further out, sin(x) uses the familiar Taylor series. However, this is only accurate near 0, so...
When the angle is more than about 7°, a different algorithm is used, computing Taylor-series approximations for both sin(x) and cos(x), then using values from a precomputed table to refine the approximation.
When |x| > 2, none of the above algorithms would work, so the code starts by computing some value closer to 0 that can be fed to sin or cos instead.
There's yet another branch to deal with x being a NaN or infinity.

This code uses some numerical hacks I've never seen before, though for all I know they might be well-known among floating-point experts. Sometimes a few lines of code would take several paragraphs to explain. For example, these two lines

double t = (x * hpinv + toint);
double xn = t - toint;

are used (sometimes) in reducing x to a value close to 0 that differs from x by a multiple of π/2, specifically xn × π/2. The way this is done without division or branching is rather clever. But there's no comment at all!

Older 32-bit versions of GCC/glibc used the fsin instruction, which is surprisingly inaccurate for some inputs. There's a fascinating blog post illustrating this with just 2 lines of code.

fdlibm's implementation of sin in pure C is much simpler than glibc's and is nicely commented. Source code: fdlibm/s_sin.c and fdlibm/k_sin.c

C++ (and maths) : fast approximation of a trigonometric function

As Jonas Wielicki mentions in the comments, there isn't much precision trade-offs you can make.

Your best bet is to try and use the processor intrinsics for the functions (if your compiler doesn't do this already) and using some math to reduce the amount of calculations necessary.

Also very important is to keep everything in a CPU-friendly format, make sure there are few cache misses, etc.

If you are calculating large amounts of functions like acos perhaps moving to the GPU is an option for you?

What is the fastest way to compute sin and cos together?

Modern Intel/AMD processors have instruction FSINCOS for calculating sine and cosine functions simultaneously. If you need strong optimization, perhaps you should use it.

Here is a small example: http://home.broadpark.no/~alein/fsincos.html

Here is another example (for MSVC): http://www.codeguru.com/forum/showthread.php?t=328669

Here is yet another example (with gcc): http://www.allegro.cc/forums/thread/588470

Hope one of them helps.
(I didn't use this instruction myself, sorry.)

As they are supported on processor level, I expect them to be way much faster than table lookups.

Edit:

Wikipedia suggests that FSINCOS was added at 387 processors, so you can hardly find a processor which doesn't support it.

Edit:

Intel's documentation states that FSINCOS is just about 5 times slower than FDIV (i.e., floating point division).

Edit:

Please note that not all modern compilers optimize calculation of sine and cosine into a call to FSINCOS. In particular, my VS 2008 didn't do it that way.

Edit:

The first example link is dead, but there is still a version at the Wayback Machine.

Fast trigonometric functions using only integer in c++ for arm target

Exynos 4412 uses the Cortex-A9 core[1], which has fully pipelined single- and double-precision floating-point. There is no reason to resort to integer operations, as there was with some older ARM cores.

Depending on your specific accuracy requirements (and especially if you can guarantee that the inputs fall into a limited range), you may be able to use approximations that are significantly faster than the implementations available in the standard library. More information about your exact usage would be necessary to give sound advice.

[1] http://en.wikipedia.org/wiki/Exynos_(system_on_chip)

Yet another fast trigonometry

They are -1/6, 1/120, -1/5040 .. and so on.

Or rather: -1/3!, 1/5!, -1/7!, 1/9!... etc

Look at the taylor series for sin x in here:

Sample Image

It has cos x right below it:

Sample Image

For cos x, as seen from the picture above, the constants are -1/2!, 1/4!, -1/6!, 1/8!...

tan x is slightly different:

Sample Image

So to adjust this for cosx:

float cosx(float x)
{
    static const float a[] = {-.5, .0416666667,-.0013888889,.0000248016,-.0000002756};
    float xsq = x*x;
    float temp = (1 + a[0]*xsq + a[1]*xsq*xsq + a[2]* xsq*xsq*xsq+a[3]*xsq*xsq*xsq*xsq+ a[4]*xsq*xsq*xsq*xsq*xsq);
    return temp;
}

C: Improving performance of function with heavy sin() usage

Besides all the other advice given in other answers, here is a pure algorithmic optimization.

In most cases, you're computing something of the form sin(k * a + b), where a and b are constants, and k is a loop variable. If you were also to compute cos(k * a + b), then you could use a 2D rotation matrix to form a recurrence relationship (in matrix form):

|cos(k*a + b)| = |cos(a)  -sin(a)| * |cos((k-1)*a + b)|
|sin(k*a + b)|   |sin(a)   cos(a)|   |sin((k-1)*a + b)|

In other words, you can calculate the value for the current iteration in terms of the value from the previous iteration. Thus, you only need to to do the full trig calculation for k == 0, but the rest can be calculated via this recurrence (once you have calculated cos(a) and sin(a), which are constants). So you eliminate 75% of the trig function calls (it's not clear the same trick can be pulled for the final set of trig calls).

Implementation of sine function in C not working

Each time your for loop progresses, n is increased by 2 and hence for DEPTH = 16, near the end of loop you have to calculate factorials of numbers as big as 30 and you are using unsigned int that can only store values as big as 2^32 = 4294967296 ~= 12! and this causes overflow in your factorial function which in turn gives you the wrong factorial.

Even if you used long double for it and I already stated in my comments that long double in MSCRT is mapped to double (Reference) you'd still see some anomalies probably at larger angles because although double can store values as big as 1.8E+308 but it loses its granularity at 2^53 = 9007199254740992 ~= 18! (i.e. 2^53 + 1 stored as a double is equal to 2^53). So once you go up in angles, the effect of this behavior becomes larger and larger to the point that it is noticeable in the 6 decimal precision that you are using with printf().

Although you are on the right track, you should use a bignum library like GMP or libcrypto. They can perform these calculations without the loss of precision.

BTW, since your are developing on Windows 7 that means you are either using x86 or x86-64. On these platforms, x87 is capable of performing extended precision (as per 754 standard) operations with 80 bits but I am unaware of compiler intrinsics that can give you that capability without resorting to assembly code.

I also would like to direct your attention to range reduction techniques. Although I still recommend using bignum libs, if you are good between 0 and 90 degrees (0 and 45 if I'm to be more strict), you can compute the sine() of all other angles just by simple trigonometric identities.

UPDATE:

Actually I'm gonna correct myself about using doubles in factorial calculations. After writing a simple program I verified that when I usedouble to store factorials, they are correct even if I go upper than 18. After giving it some thought I realized that in the case of factorials, the situation with double's granularity is a little bit more complex. I'll give you an example to make it clear:

19! = 19 * 18 * ... * 2 * 1

in this number 18, 16, 14, ... , 2 are all multiples of 2 and since a multiplication by 2 is equivalent to a shift to the left in binary representation, all lower bits in 19! are already 0 and hence when double's rounding kicks in for integers greater than 2^53, these factorials are unaffected. You can compute the number of least significant zeroes in the binary representation of 19! by counting the number of 2's which is 16. (for 20!, it is 18)

I'm gonna go up to 1.8e+308 and check if all the factorials are unaffected or not. I'll update you with the results.

UPDATE 2:

If we use doubles to hold factorials, they are affected by rounding from 23! onward. It can be easily shown, because 2^74 < 23! < 2^75 which means that at least 75 bits of precision is required to represent it, but since 23! has 19 least significant bits with the value of 0, so it needs 75 - 19 = 56 which is larger than 53 bits provided by double.

For 22!, it is 51 bits (you can calculate it yourself).

Fast trigonometric functions using only integer in c++ for arm target

[1] http://en.wikipedia.org/wiki/Exynos_(system_on_chip)

Fast Implementation of Trigonometric Functions for C++