What Is the Fastest Way to Compute Sin and Cos Together

What is the fastest way to compute sin and cos together?

Modern Intel/AMD processors have instruction FSINCOS for calculating sine and cosine functions simultaneously. If you need strong optimization, perhaps you should use it.

Here is a small example: http://home.broadpark.no/~alein/fsincos.html

Here is another example (for MSVC): http://www.codeguru.com/forum/showthread.php?t=328669

Here is yet another example (with gcc): http://www.allegro.cc/forums/thread/588470

Hope one of them helps.
(I didn't use this instruction myself, sorry.)

As they are supported on processor level, I expect them to be way much faster than table lookups.

Edit:

Wikipedia suggests that FSINCOS was added at 387 processors, so you can hardly find a processor which doesn't support it.

Edit:

Intel's documentation states that FSINCOS is just about 5 times slower than FDIV (i.e., floating point division).

Edit:

Please note that not all modern compilers optimize calculation of sine and cosine into a call to FSINCOS. In particular, my VS 2008 didn't do it that way.

Edit:

The first example link is dead, but there is still a version at the Wayback Machine.

Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)

The fastest way is to pre-compute values an use a table like in this example:

Create sine lookup table in C++

BUT if you insist upon computing at runtime you can use the Taylor series expansion of sine or cosine...

Taylor Series of sine

For more on the Taylor series... http://en.wikipedia.org/wiki/Taylor_series

One of the keys to getting this to work well is pre-computing the factorials and truncating at a sensible number of terms. The factorials grow in the denominator very quickly, so you don't need to carry more than a few terms.

Also...Don't multiply your x^n from the start each time...e.g. multiply x^3 by x another two times, then that by another two to compute the exponents.

c++ libstd compute sin and cos simultaneously

Is there no such function in c++ standard library?

No, unfortunately there isn't.

In C library math.h, there was a sincos function

On Linux, it is available as GNU Extension. It's not standard in C either.

How does C compute sin() and other math functions?

In GNU libm, the implementation of sin is system-dependent. Therefore you can find the implementation, for each platform, somewhere in the appropriate subdirectory of sysdeps.

One directory includes an implementation in C, contributed by IBM. Since October 2011, this is the code that actually runs when you call sin() on a typical x86-64 Linux system. It is apparently faster than the fsin assembly instruction. Source code: sysdeps/ieee754/dbl-64/s_sin.c, look for __sin (double x).

This code is very complex. No one software algorithm is as fast as possible and also accurate over the whole range of x values, so the library implements several different algorithms, and its first job is to look at x and decide which algorithm to use.

When x is very very close to 0, sin(x) == x is the right answer.
A bit further out, sin(x) uses the familiar Taylor series. However, this is only accurate near 0, so...
When the angle is more than about 7°, a different algorithm is used, computing Taylor-series approximations for both sin(x) and cos(x), then using values from a precomputed table to refine the approximation.
When |x| > 2, none of the above algorithms would work, so the code starts by computing some value closer to 0 that can be fed to sin or cos instead.
There's yet another branch to deal with x being a NaN or infinity.

This code uses some numerical hacks I've never seen before, though for all I know they might be well-known among floating-point experts. Sometimes a few lines of code would take several paragraphs to explain. For example, these two lines

double t = (x * hpinv + toint);
double xn = t - toint;

are used (sometimes) in reducing x to a value close to 0 that differs from x by a multiple of π/2, specifically xn × π/2. The way this is done without division or branching is rather clever. But there's no comment at all!

Older 32-bit versions of GCC/glibc used the fsin instruction, which is surprisingly inaccurate for some inputs. There's a fascinating blog post illustrating this with just 2 lines of code.

fdlibm's implementation of sin in pure C is much simpler than glibc's and is nicely commented. Source code: fdlibm/s_sin.c and fdlibm/k_sin.c

Calculation sine and cosine in one shot

If you seek fast evaluation with good (but not high) accuracy with powerseries you should use an expansion in Chebyshev polynomials: tabulate the coefficients (you'll need VERY few for 0.1% accuracy) and evaluate the expansion with the recursion relations for these polynomials (it's really very easy).

References:

Tabulated coefficients: http://www.ams.org/mcom/1980-34-149/S0025-5718-1980-0551302-5/S0025-5718-1980-0551302-5.pdf
Evaluation of chebyshev expansion: https://en.wikipedia.org/wiki/Chebyshev_polynomials

You'll need to (a) get the "reduced" argument in the range -pi/2..+pi/2 and consequently then (b) handle the sign in your results when the argument actually should have been in the "other" half of the full elementary interval -pi..+pi. These aspects should not pose a major problem:

determine (and "remember" as an integer 1 or -1) the sign in the original angle and proceed with the absolute value.
use a modulo function to reduce to the interval 0..2PI
Determine (and "remember" as an integer 1 or -1) whether it is in the "second" half and, if so, subtract pi*3/2, otherwise subtract pi/2. Note: this effectively interchanges sine and cosine (apart from signs); take this into account in the final evaluation.

This completes the step to get an angle in -pi/2..+pi/2
After evaluating sine and cosine with the Cheb-expansions, apply the "flags" of steps 1 and 3 above to get the right signs in the values.

Fast Sin/Cos using a pre computed translation array

You could try to use unsafe code to eliminate array bounds checking.

But even a unsafe, optimized version does not seem to come anywhere near Math.Sin.

Results based on 1'000'000'000 iterations with random values:

(1) 00:00:57.3382769  // original version
(2) 00:00:31.9445928  // optimized version
(3) 00:00:21.3566399  // Math.Sin

Code:

static double SinOriginal(double Value)
{
    Value %= PI2;
    if (Value < 0) Value += PI2;
    int index = (int)(Value * FACTOR);
    return _SineDoubleTable[index];
}

static unsafe double SinOptimized(double* SineDoubleTable, double Value)
{
    int index = (int)(Value * FACTOR) % TABLE_SIZE;
    return (index < 0) ? SineDoubleTable[index + TABLE_SIZE]
                       : SineDoubleTable[index];
}

Test program:

InitializeTrigonometricTables();
Random random = new Random();

SinOriginal(random.NextDouble());
var sw = System.Diagnostics.Stopwatch.StartNew();
for (long i = 0; i < 1000000000L; i++)
{
    SinOriginal(random.NextDouble());
}
sw.Stop();
Console.WriteLine("(1) {0}  // original version", sw.Elapsed);

fixed (double* SineDoubleTable = _SineDoubleTable)
{
    SinOptimized(SineDoubleTable, random.NextDouble());
    sw = System.Diagnostics.Stopwatch.StartNew();
    for (long i = 0; i < 1000000000L; i++)
    {
        SinOptimized(SineDoubleTable, random.NextDouble());
    }
    sw.Stop();
    Console.WriteLine("(2) {0}  // optimized version", sw.Elapsed);
}

Math.Sin(random.NextDouble());
sw = System.Diagnostics.Stopwatch.StartNew();
for (long i = 0; i < 1000000000L; i++)
{
    Math.Sin(random.NextDouble());
}
sw.Stop();
Console.WriteLine("(3) {0}  // Math.Sin", sw.Elapsed);

Simple trigonometric algorithm to compute sin() and cos() of fixed-point numbers

Finally I have implemented the sin metafunction through Taylor series, using series of 10 terms by default (Could be configurable).
I have based my implementation in this interesting article.

My library includes an implementation of a tmp for loop using iterators, and expression templates to allow write complex expressions in a "clear" way (Clear compared to the common template-meta-programming syntax add<mul<sub<1,2>>>...). This allows me to literally copy-paste the C implementation provided by the article:

template<typename T , typename TERMS_COUNT = mpl::uinteger<4>>
struct sin_t;

template<typename T , typename TERMS_COUNT = mpl::uinteger<4>>
using sin = typename sin_t<T,TERMS_COUNT>::result;

/*
 * sin() function implementation through Taylor series (Check http://www10.informatik.uni-erlangen.de/~pflaum/pflaum/ProSeminar/meta-art.html)
 * 
 * The C equivalent code is:
 * 
 * // Calculate sin(x) using j terms
 * float sine(float x, int j)
 * {
 *     float val = 1;
 *
 *     for (int k = j - 1; k >= 0; --k)
 *         val = 1 - x*x/(2*k+2)/(2*k+3)*val;
 *
 *     return x * val;
 * }
 */

template<mpl::fpbits BITS , mpl::fbcount PRECISION , unsigned int TERMS_COUNT>
struct sin_t<mpl::fixed_point<BITS,PRECISION>,mpl::uinteger<TERMS_COUNT>>
{
private:
    using x = mpl::fixed_point<BITS,PRECISION>;

    using begin = mpl::make_integer_backward_iterator<TERMS_COUNT-1>;
    using end   = mpl::make_integer_backward_iterator<-1>;

    using one   = mpl::decimal<1,0,PRECISION>;
    using two   = mpl::decimal<2,0,PRECISION>;
    using three = mpl::decimal<3,0,PRECISION>;

    template<typename K , typename VAL>
    struct kernel : public mpl::function<decltype( one() - ( x() * x() )/(two() * K() + two())/(two()*K()+three())*VAL() )> {};

public:
    using result = decltype( x() * mpl::for_loop<begin , end , one , kernel>() );
};

Here is the header of the implementation in the project repo.

What Is the Fastest Way to Compute Sin and Cos Together