How to Write a Power Function Myself

How can I write a power function myself?

Negative powers are not a problem, they're just the inverse (1/x) of the positive power.

Floating point powers are just a little bit more complicated; as you know a fractional power is equivalent to a root (e.g. x^(1/2) == sqrt(x)) and you also know that multiplying powers with the same base is equivalent to add their exponents.

With all the above, you can:

  • Decompose the exponent in a integer part and a rational part.
  • Calculate the integer power with a loop (you can optimise it decomposing in factors and reusing partial calculations).
  • Calculate the root with any algorithm you like (any iterative approximation like bisection or Newton method could work).
  • Multiply the result.
  • If the exponent was negative, apply the inverse.

Example:

2^(-3.5) = (2^3 * 2^(1/2)))^-1 = 1 / (2*2*2 * sqrt(2))

Writing your own exponential power function with decimals

Heres a nice algorithm for positive integer powers, it starts by dealing with some simple cases and then uses a loop testing the binary bits of the exponent. For example to find 3^11 11 in binary is 1011 so the stages in the loop are

  • bitMask = 1011, evenPower = 3, result = 3
  • bitMask = 101, evenPower = 3*3 = 9, result = 3*9 = 27
  • bitMask = 10, evenPower = 9*9 = 81, result = 27
  • bitMask = 1, evenPower = 81*81 = 6561, result = 27*6561 = 177147

That is the evenPower squares at each loop, and the result gets multiplied by the evenPower if the bottom bit is 1. The code has been lifted from Patricia Shanahan’s method http://mindprod.com/jgloss/power.html which in turn has its roots in Kunth and can be traced back to 200 BC in india.

/**
* A fast routine for computing integer powers.
* Code adapted from {@link <a href="http://mindprod.com/jgloss/power.html">efficient power</a>} by Patricia Shanahan pats@acm.org
* Almost identical to the method Knuth gives on page 462 of The Art of Computer Programming Volume 2 Seminumerical Algorithms.
* @param l number to be taken to a power.
* @param n power to take x to. 0 <= n <= Integer.MAX_VALUE
* Negative numbers will be treated as unsigned positives.
* @return x to the power n
*
*/
public static final double power(double l,int n)
{
assert n>=0;

double x=l;
switch(n){
case 0: x = 1.0; break;
case 1: break;
case 2: x *= x; break;
case 3: x *= x*x; break;
case 4: x *= x; x *= x; break;
case 5: { double y = x*x; x *= y*y; } break;
case 6: { double y = x*x; x = y*y*y; } break;
case 7: { double y = x*x; x *= y*y*y; } break;
case 8: x *= x; x *= x; x *= x; break;
default:
{
int bitMask = n;
double evenPower = x;
double result;
if ( (bitMask & 1) != 0 )
result = x;
else
result = 1;
bitMask >>>= 1;
while ( bitMask != 0 ) {
evenPower *= evenPower;
if ( (bitMask & 1) != 0 )
result *= evenPower;
bitMask >>>= 1;
} // end while
x = result;
}
}
return x;
}

For a real exponent you will basically need ways of finding exp and log. You can use Taylor series which are the simplest to get but there are much better method. We have

exp(x) = 1 + x + x^2/2 + x^3/6 + x^4/24 + x^5/120 + x^6/6! + ...

ln(1+x) = x - x^2 /2 + x^3 /3 - x^4 / 4 + x^5 / 5 - x^6/6 + ... |x|<1

To find x^y note ln(x^y) = y*ln(x). Now we need to get the argument in the right range so we can use our power series. Let x = m * 2^ex, the mantissa and exponent chosen so 1/sqrt(2)<= m < sqrt(2) and ln(m*2^ex) = ln(m) + ex*ln(2). Let h = m-1 and find ln(1+h).

Using java and floats as there is an easy way to find the internals of the IEEE representation (I've used float as there are fewer bits to cope with)

int b = Float.floatToIntBits(x);
int sign = (b & 0x80000000) == 0 ? 1 : -1;
int mattissa = b & 0x007fffff;
int ex = ((b & 0x7f800000) >> 23 ) - 127;

in javascript it might be easiest to us Number.toExponential and parse the results.

Next construct a number z in the desired range 1/sqrt(2) < z < sqrt(2)

int bits = mattissa | 0x3f800000;
float z = Float.intBitsToFloat(bits);
if(z>root2) {
z = z/2;
++ex;
}

Use this function to find the log of 1+x using a taylor series

static float ln1px(float x) {
float x_2 = x*x; // powers of x
float x_3 = x_2 * x;
float x_4 = x_3 * x;
float x_5 = x_4 * x;
float x_6 = x_5 * x;
float res = x - x_2 /2 + x_3 /3 - x_4 / 4 + x_5 / 5 - x_6/6;
return res;
}

this seems to be good to three significant figures, often much better when x is close to 0.

The log of our number x can then be found

float w = z - 1;
float ln_z = ln1px(w);
float ln_x = ln_z + ln2 * ex;
System.out.println("ln "+ln_x+"\t"+Math.log(x));

Now to the actual power if we write y = n + a where n is an integer and a is fractional. So
x^y=x^(n+a) = x^n * x^a. use the first algorithm in this answer to find the x^n. Writing x=m*2^ex then ln((m*2^ex)^a) = yln(m) + yex*ln(2) and

x^a=exp(ln((m*2^ex)^a)) = exp(a * ln(m)) * exp(a * ln(2))^ex

the two exponential terms have fairly small values so the taylor series should be good.

We need one function for the taylor series of the exponential function

static float exp(float x) {
float x_2 = x*x; // powers of x
float x_3 = x_2 * x;
float x_4 = x_3 * x;
float x_5 = x_4 * x;
float x_6 = x_5 * x;
float res = 1+ x + x_2 /2 + x_3 /6 + x_4 / 24 + x_5 / 120 + x_6/ 720;
return res;
}

finally we can put the pieces together

// Get integer and fractional parts of y
int n = (int) Math.floor(y);
float a = y-n;

float x_n = power(x,n); // x^n
float a_ln_m = a * ln_z; // ln(m^a) = a ln(m)
float a_ln_2 = a * ln2; // ln(2^a) = a ln(2)
float m_a = exp(a_ln_m); // m^a = exp(a ln(m))
float _2_a = exp(a_ln_2); // 2^a = exp(a ln(2))
float _2_a_ex = power(_2_a,ex); // (2^ex)^a = 2^(a*ex) = (2^a)^ex
float x_a = m_a * _2_a_ex; // x^a = m^a * 2^(a*ex)

float x_y = x_n * x_a; // x^y = x^n * x^a

System.out.println("x^y "+x_y+"\t"+Math.pow(x,y));

That should be the complete program, you need some smarts to cope with negative arguments etc.

Note this is not particularly accurate as I've only used a few terms of the taylor series. Other SO questions have more detailed answers How can I write a power function myself?

How I can create my own pow function using PHP?

Oopsika, couldn't have asked a more obvious question. Use the built-in function named pow (as in a lot of languages)

echo pow(3, 3);

Edit

Let's create our own function.

function raiseToPower($base,$exponent)
{
// multiply the base to itself exponent number of times
$result=1;
for($i=1;$i<=$exponent;$i++)
{
$result = $result * $base;
}
return $result;
}

Write Pow Function Without math.h in C

Everything seems perfect except for one condition :- when b<0.

For b<0,simply return

return (1.0/a)*MyPow(a,abs(b)-1);   //where  abs(b) is  absolute value of b.

OR

return (1.0/a)*(MyPow(a,b+1));      

Also,your definition of function is not valid for performing negative exponentiation,you should change it to

float MyPow(int a,int b)

How to write a power function for big numbers in Java

This will work, but be aware that BigInteger is not very efficient and I don't consider it suitable for production work for math problems.

public static void main( String[] args ) {
BigInteger two = new BigInteger( "2" );
BigInteger twoToTenThousand = two.pow( 10000 );
System.out.println( twoToTenThousand );
}

Efficient implementation of fixed-point power (pow) function with argument constraints

The problem as stated by OP is underspecified. What accuracy is needed, what performance is required? As a is known to be non-negative, a suitable starting point is to compute ab as exp2 (b * log2 (a)), with fixed-point implementations for those functions. Based on my experience with fixed-point arithmetic in embedded systems, using an s15.16 fixed-point format is often a good compromise between representable range and accuracy when generic floating-point computation is replaced with fixed-point computation. So I will adopt that here.

The principal choices for implementing exp2() and log2() in fixed-point are bitwise computation, quadratic interpolation in a table, and polynomial approximation. The latter two benefit from a hardware multiplier that can produce a double-width product, and table-based approaches work best with a cache of several kilobytes. The OP's specification of a microcontroller suggests that resources for both code and data may be limited, and that this is fairly low-end hardware. So the bit-wise approach may be a good starting point, as it requires only a tiny table shared between the two functions. the code size is small as well, in particular when the compiler is directed to optimize for code size (-Os with most compilers).

Bit-wise computations for these functions are also known a pseudo-division and pseudo-multiplication and are closely related to the way by which Henry Briggs (1561 – 1630) computed tables logarithms.

For the computation of the logarithm, we chose, by trial and error, from among a sequence of particular factors those, that when multiplied with the function argument, normalize it to unity. For every factor chosen, we add up the corresponding logarithm value stored in a table. The sum then corresponds to the logarithm of the function argument. For the integer portion, we would want to chose 2i, i = 1, 2, 4, 8, ... and for the factional part we choose 1+2-i, i = 1, 2, 3, 4, 5 ...

The algorithm for the exponential is closely related and quite literally the inverse of the logarithm algorithm. From among the sequence of tabulated logarithms for our chosen factors, we pick those that when subtracted from the function argument ultimately reduce it to zero. Starting with unity, we multiply with all the factors whose logarithms we have subtracted in the process. The resulting product corresponds to the result of the exponentiation.

Logarithms of values greater than unity can be computed by applying the algorithm to the reciprocal of the function argument (with negation of the result as appropriate), likewise exponentiation with negative function arguments requires us negate the function argument and compute the reciprocal at the end. So in this aspect the two algorithms are, unsurprisingly, symmetric as well.

The following ISO-C99 code is a straightforward implementation of these two algorithms. Note that this code uses a fixed-point multiplication with rounding. The incremental cost is minimal and it does enhance the accuracy of the overall pow() computation.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>

/* s15.16 division without rounding */
int32_t fxdiv_s15p16 (int32_t x, int32_t y)
{
return ((int64_t)x * 65536) / y;
}

/* s15.16 multiplication with rounding */
int32_t fxmul_s15p16 (int32_t x, int32_t y)
{
int32_t r;
int64_t t = (int64_t)x * (int64_t)y;
r = (int32_t)(uint32_t)(((uint64_t)t + (1 << 15)) >> 16);
return r;
}

/* log2(2**8), ..., log2(2**1), log2(1+2**(-1), ..., log2(1+2**(-16)) */
const uint32_t tab [20] = {0x80000, 0x40000, 0x20000, 0x10000,
0x095c1, 0x0526a, 0x02b80, 0x01663,
0x00b5d, 0x005b9, 0x002e0, 0x00170,
0x000b8, 0x0005c, 0x0002e, 0x00017,
0x0000b, 0x00006, 0x00003, 0x00001};
const int32_t one_s15p16 = 1 * (1 << 16);
const int32_t neg_fifteen_s15p16 = (-15) * (1 << 16);

int32_t fxlog2_s15p16 (int32_t a)
{
uint32_t x, y;
int32_t t, r;

x = (a > one_s15p16) ? fxdiv_s15p16 (one_s15p16, a) : a;
y = 0;
/* process integer bits */
if ((t = x << 8) < one_s15p16) { x = t; y += tab [0]; }
if ((t = x << 4) < one_s15p16) { x = t; y += tab [1]; }
if ((t = x << 2) < one_s15p16) { x = t; y += tab [2]; }
if ((t = x << 1) < one_s15p16) { x = t; y += tab [3]; }
/* process fraction bits */
for (int shift = 1; shift <= 16; shift++) {
if ((t = x + (x >> shift)) < one_s15p16) { x = t; y += tab[3 + shift]; }
}
r = (a > one_s15p16) ? y : (0 - y);
return r;
}

int32_t fxexp2_s15p16 (int32_t a)
{
uint32_t x, y;
int32_t t, r;

if (a <= neg_fifteen_s15p16) return 0; // underflow

x = (a < 0) ? (-a) : (a);
y = one_s15p16;
/* process integer bits */
if ((t = x - tab [0]) >= 0) { x = t; y = y << 8; }
if ((t = x - tab [1]) >= 0) { x = t; y = y << 4; }
if ((t = x - tab [2]) >= 0) { x = t; y = y << 2; }
if ((t = x - tab [3]) >= 0) { x = t; y = y << 1; }
/* process fractional bits */
for (int shift = 1; shift <= 16; shift++) {
if ((t = x - tab [3 + shift]) >= 0) { x = t; y = y + (y >> shift); }
}
r = (a < 0) ? fxdiv_s15p16 (one_s15p16, y) : y;
return r;
}

/* compute a**b for a >= 0 */
int32_t fxpow_s15p16 (int32_t a, int32_t b)
{
return fxexp2_s15p16 (fxmul_s15p16 (b, fxlog2_s15p16 (a)));
}

double s15p16_to_double (int32_t a)
{
return a / 65536.0;
}

int32_t double_to_s15p16 (double a)
{
return ((int32_t)(a * 65536.0 + ((a < 0) ? (-0.5) : 0.5)));
}

int main (void)
{
int32_t a = double_to_s15p16 (0.125);
do {
int32_t b = double_to_s15p16 (-5);
do {
double fa = s15p16_to_double (a);
double fb = s15p16_to_double (b);
double reff = pow (fa, fb);
int32_t res = fxpow_s15p16 (a, b);
int32_t ref = double_to_s15p16 (reff);
double fres = s15p16_to_double (res);
double fref = s15p16_to_double (ref);
printf ("a=%08x (%15.8e) b=%08x (% 15.8e) res=%08x (% 15.8e) ref=%08x (%15.8e)\n",
a, fa, b, fb, res, fres, ref, fref);
b += double_to_s15p16 (0.5);
} while (b <= double_to_s15p16 (6));
printf ("\n");
a += double_to_s15p16 (0.125);
} while (a <= double_to_s15p16 (1.0));
return EXIT_SUCCESS;
}

For processors with a reasonable amount cache and a fast integer multiply, we can use quadratic interpolation in tables for very accurate and fast computation on log2() and exp2(). To do so, we make use of a pseudo-floating-point representation where the argument x = 2i(1+f), with f in [0,1). The fraction f is normalized using the full word size of 32 bits. For the logarithm, we tabulate log2(1+f) which falls into [0, 1). For the exponential we tabulate exp2(f)-1 which also falls into [0,1). Obviously we need to add 1 to the result of the table interpolation.

For a quadratic interpolation we need a total of three table entries through which we fit a parabola, whose coefficients a and b can be computed on the fly. The difference between the function argument and the closest node as dx, we can then add a(dx)2+b(dx) to the corresponding table entry for interpolation. The need for three consecutive table entries to fit the parabola also means we need to tabulate two additional values beyond our primary interpolation interval. The inaccuracy due to the use of fixed-point arithmetic for intermediate computation can partially be compensated by adding a rounding constant at the end, which needs to be determined experimentally.

    /* for i = 0 to 65: log2 (1 + i/64) * 2**31 */
uint32_t logTab [66] =
{
0x00000000, 0x02dcf2d1, 0x05aeb4dd, 0x08759c50,
0x0b31fb7d, 0x0de42120, 0x108c588d, 0x132ae9e2,
0x15c01a3a, 0x184c2bd0, 0x1acf5e2e, 0x1d49ee4c,
0x1fbc16b9, 0x22260fb6, 0x24880f56, 0x26e2499d,
0x2934f098, 0x2b803474, 0x2dc4439b, 0x30014ac6,
0x32377512, 0x3466ec15, 0x368fd7ee, 0x38b25f5a,
0x3acea7c0, 0x3ce4d544, 0x3ef50ad2, 0x40ff6a2e,
0x43041403, 0x450327eb, 0x46fcc47a, 0x48f10751,
0x4ae00d1d, 0x4cc9f1ab, 0x4eaecfeb, 0x508ec1fa,
0x5269e12f, 0x5440461c, 0x5612089a, 0x57df3fd0,
0x59a80239, 0x5b6c65aa, 0x5d2c7f59, 0x5ee863e5,
0x60a02757, 0x6253dd2c, 0x64039858, 0x65af6b4b,
0x675767f5, 0x68fb9fce, 0x6a9c23d6, 0x6c39049b,
0x6dd2523d, 0x6f681c73, 0x70fa728c, 0x72896373,
0x7414fdb5, 0x759d4f81, 0x772266ad, 0x78a450b8,
0x7a231ace, 0x7b9ed1c7, 0x7d17822f, 0x7e8d3846,
0x80000000, 0x816fe50b
};

/* for i = 0 to 129: exp2 ((i/128) - 1) * 2**31 */
uint32_t expTab [130] =
{
0x00000000, 0x00b1ed50, 0x0164d1f4, 0x0218af43,
0x02cd8699, 0x0383594f, 0x043a28c4, 0x04f1f656,
0x05aac368, 0x0664915c, 0x071f6197, 0x07db3580,
0x08980e81, 0x0955ee03, 0x0a14d575, 0x0ad4c645,
0x0b95c1e4, 0x0c57c9c4, 0x0d1adf5b, 0x0ddf0420,
0x0ea4398b, 0x0f6a8118, 0x1031dc43, 0x10fa4c8c,
0x11c3d374, 0x128e727e, 0x135a2b2f, 0x1426ff10,
0x14f4efa9, 0x15c3fe87, 0x16942d37, 0x17657d4a,
0x1837f052, 0x190b87e2, 0x19e04593, 0x1ab62afd,
0x1b8d39ba, 0x1c657368, 0x1d3ed9a7, 0x1e196e19,
0x1ef53261, 0x1fd22825, 0x20b05110, 0x218faecb,
0x22704303, 0x23520f69, 0x243515ae, 0x25195787,
0x25fed6aa, 0x26e594d0, 0x27cd93b5, 0x28b6d516,
0x29a15ab5, 0x2a8d2653, 0x2b7a39b6, 0x2c6896a5,
0x2d583eea, 0x2e493453, 0x2f3b78ad, 0x302f0dcc,
0x3123f582, 0x321a31a6, 0x3311c413, 0x340aaea2,
0x3504f334, 0x360093a8, 0x36fd91e3, 0x37fbefcb,
0x38fbaf47, 0x39fcd245, 0x3aff5ab2, 0x3c034a7f,
0x3d08a39f, 0x3e0f680a, 0x3f1799b6, 0x40213aa2,
0x412c4cca, 0x4238d231, 0x4346ccda, 0x44563ecc,
0x45672a11, 0x467990b6, 0x478d74c9, 0x48a2d85d,
0x49b9bd86, 0x4ad2265e, 0x4bec14ff, 0x4d078b86,
0x4e248c15, 0x4f4318cf, 0x506333db, 0x5184df62,
0x52a81d92, 0x53ccf09a, 0x54f35aac, 0x561b5dff,
0x5744fccb, 0x5870394c, 0x599d15c2, 0x5acb946f,
0x5bfbb798, 0x5d2d8185, 0x5e60f482, 0x5f9612df,
0x60ccdeec, 0x62055b00, 0x633f8973, 0x647b6ca0,
0x65b906e7, 0x66f85aab, 0x68396a50, 0x697c3840,
0x6ac0c6e8, 0x6c0718b6, 0x6d4f301f, 0x6e990f98,
0x6fe4b99c, 0x713230a8, 0x7281773c, 0x73d28fde,
0x75257d15, 0x767a416c, 0x77d0df73, 0x792959bb,
0x7a83b2db, 0x7bdfed6d, 0x7d3e0c0d, 0x7e9e115c,
0x80000000, 0x8163daa0,
};

uint8_t clz_tab[32] = {
31, 22, 30, 21, 18, 10, 29, 2, 20, 17, 15, 13, 9, 6, 28, 1,
23, 19, 11, 3, 16, 14, 7, 24, 12, 4, 8, 25, 5, 26, 27, 0};

/* count leading zeros; this is a machine instruction on many architectures */
int32_t clz (uint32_t a)
{
a |= a >> 16;
a |= a >> 8;
a |= a >> 4;
a |= a >> 2;
a |= a >> 1;
return clz_tab [0x07c4acdd * a >> 27];
}

int32_t fxlog2_s15p16 (int32_t arg)
{
int32_t lz, f1, f2, dx, a, b, approx;
uint32_t t, idx, x = arg;
lz = clz (x);
/* shift off integer bits and normalize fraction 0 <= f <= 1 */
t = x << (lz + 1);
/* index table of values log2 (1 + f) using 6 msbs of fraction */
idx = (unsigned)t >> (32 - 6);
/* difference between argument and smallest sampling point */
dx = t - (idx << (32 - 6));
/* fit parabola through closest three sampling points; find coeffs a, b */
f1 = (logTab[idx+1] - logTab[idx]);
f2 = (logTab[idx+2] - logTab[idx]);
a = f2 - (f1 << 1);
b = (f1 << 1) - a;
/* find function value for argument by computing ((a*dx+b)*dx) */
approx = (int32_t)((((int64_t)a)*dx) >> (32 - 6)) + b;
approx = (int32_t)((((int64_t)approx)*dx) >> (32 - 6 + 1));
/* compute fraction result; add experimentally determined rounding constant */
approx = logTab[idx] + approx + 0x410d;
/* combine integer and fractional parts of result */
approx = ((15 - lz) << 16) + (((uint32_t)approx) >> 15);
return approx;
}

int32_t fxexp2_s15p16 (int32_t x)
{
int32_t f1, f2, dx, a, b, approx, idx, i, f;

/* extract integer portion; 2**i is realized as a shift at the end */
i = (x >> 16);
/* extract fraction f so we can compute 2**(f)-1, 0 <= f < 1 */
f = x & 0xffff;
/* index table of values exp2 (f) - 1 using 7 msbs of fraction */
idx = (uint32_t)f >> (16 - 7);
/* difference between argument and next smaller sampling point */
dx = f - (idx << (16 - 7));
/* fit parabola through closest three sampling point; find coeffs a,b */
f1 = (expTab[idx+1] - expTab[idx]);
f2 = (expTab[idx+2] - expTab[idx]);
a = f2 - (f1 << 1);
b = (f1 << 1) - a;
/* find function value for argument by computing ((a*dx+b)*dx) */
approx = (int32_t)((((int64_t)a)*dx) >> (16 - 7)) + b;
approx = (int32_t)((((int64_t)approx)*dx) >> (16 - 7 + 1));
/* combine integer and fractional parts of result; add 1.0 and experimentally determined rounding constant */
approx = (((expTab[idx] + (unsigned)approx + 0x80000012U) >> (14 - i)) + 1) >> 1;
/* Handle underflow to 0 */
approx = ((i < -16) ? 0 : approx);
return approx;
}

Finally, on platforms for which data storage is limited but which provide a very fast integer multiplier, one can use a polynomial approximation of the minimax kind, i.e. one which minimizes the maximum error. The code for this will be similar to the table-based method, except that the two core approximations, log2(1+f) and exp2(f)-1 for f in [0,1), are computed via polynomials instead of tables. For maximum accuracy and code efficiency, we would want to represent the coefficients as unsigned pure fractions, that is, by using a u0.32 fixed-point format. In the case of log2(), the leading coefficient is ~= 1.44, so will not fit into this format. However, this is easily solved by splitting this coefficient into two parts, 1 and 0.44. For fixed-point computation evaluation via Horner scheme is typically not necessary, and best use of a pipelined multiplier plus an increase in instruction-level parallelism can be achieved with an Estrin-like scheme, as pointed out in

Claude-Pierre Jeannerod, Jingyan Jourdan-Lu, "Simultaneous floating-point sine and cosine for VLIW integer processors". In 23rd IEEE International Conference on Application-Specific Systems, Architectures and Processors, July 2012, pp. 69-76 (preprint online)

    /* a single instruction in many 32-bit architectures */
uint32_t umul32hi (uint32_t a, uint32_t b)
{
return (uint32_t)(((uint64_t)a * b) >> 32);
}

uint8_t clz_tab[32] = {
31, 22, 30, 21, 18, 10, 29, 2, 20, 17, 15, 13, 9, 6, 28, 1,
23, 19, 11, 3, 16, 14, 7, 24, 12, 4, 8, 25, 5, 26, 27, 0};

/* count leading zeros; this is a machine instruction on many architectures */
int32_t clz (uint32_t a)
{
a |= a >> 16;
a |= a >> 8;
a |= a >> 4;
a |= a >> 2;
a |= a >> 1;
return clz_tab [0x07c4acdd * a >> 27];
}

/* compute log2() with s15.16 fixed-point argument and result */
int32_t fxlog2_s15p16 (int32_t arg)
{
const uint32_t a0 = (uint32_t)((1.44269476063 - 1)* (1LL << 32) + 0.5);
const uint32_t a1 = (uint32_t)(7.2131008654833e-1 * (1LL << 32) + 0.5);
const uint32_t a2 = (uint32_t)(4.8006370104849e-1 * (1LL << 32) + 0.5);
const uint32_t a3 = (uint32_t)(3.5339481476694e-1 * (1LL << 32) + 0.5);
const uint32_t a4 = (uint32_t)(2.5600972794928e-1 * (1LL << 32) + 0.5);
const uint32_t a5 = (uint32_t)(1.5535182948224e-1 * (1LL << 32) + 0.5);
const uint32_t a6 = (uint32_t)(6.3607925549150e-2 * (1LL << 32) + 0.5);
const uint32_t a7 = (uint32_t)(1.2319647939876e-2 * (1LL << 32) + 0.5);
int32_t lz;
uint32_t approx, h, m, l, z, y, x = arg;
lz = clz (x);
/* shift off integer bits and normalize fraction 0 <= f <= 1 */
x = x << (lz + 1);
y = umul32hi (x, x); // f**2
z = umul32hi (y, y); // f**4
/* evaluate minimax polynomial to compute log2(1+f) */
h = a0 - umul32hi (a1, x);
m = umul32hi (a2 - umul32hi (a3, x), y);
l = umul32hi (a4 - umul32hi (a5, x) + umul32hi(a6 - umul32hi(a7, x), y), z);
approx = x + umul32hi (x, h + m + l);
/* combine integer and fractional parts of result; round result */
approx = ((15 - lz) << 16) + ((((approx) >> 15) + 1) >> 1);
return approx;
}

/* compute exp2() with s15.16 fixed-point argument and result */
int32_t fxexp2_s15p16 (int32_t arg)
{
const uint32_t a0 = (uint32_t)(6.9314718107e-1 * (1LL << 32) + 0.5);
const uint32_t a1 = (uint32_t)(2.4022648809e-1 * (1LL << 32) + 0.5);
const uint32_t a2 = (uint32_t)(5.5504413787e-2 * (1LL << 32) + 0.5);
const uint32_t a3 = (uint32_t)(9.6162736882e-3 * (1LL << 32) + 0.5);
const uint32_t a4 = (uint32_t)(1.3386828359e-3 * (1LL << 32) + 0.5);
const uint32_t a5 = (uint32_t)(1.4629773796e-4 * (1LL << 32) + 0.5);
const uint32_t a6 = (uint32_t)(2.0663021132e-5 * (1LL << 32) + 0.5);
int32_t i;
uint32_t approx, h, m, l, s, q, f, x = arg;

/* extract integer portion; 2**i is realized as a shift at the end */
i = ((x >> 16) ^ 0x8000) - 0x8000;
/* extract and normalize fraction f to compute 2**(f)-1, 0 <= f < 1 */
f = x << 16;
/* evaluate minimax polynomial to compute exp2(f)-1 */
s = umul32hi (f, f); // f**2
q = umul32hi (s, s); // f**4
h = a0 + umul32hi (a1, f);
m = umul32hi (a2 + umul32hi (a3, f), s);
l = umul32hi (a4 + umul32hi (a5, f) + umul32hi (a6, s), q);
approx = umul32hi (f, h + m + l);
/* combine integer and fractional parts of result; round result */
approx = ((approx >> (15 - i)) + (0x80000000 >> (14 - i)) + 1) >> 1;
/* handle underflow to 0 */
approx = ((i < -16) ? 0 : approx);
return approx;
}

Exponents without using the pow() function

I don’t really know how to describe what’s wrong, here – your code doesn’t make sense for calculating an exponent. Start with 1, multiply by the base $exponent times.

$result = 1;

for ($i = 0; $i < $exponent; $i++) {
$result *= $base;
}

echo $result;


Related Topics



Leave a reply



Submit