Rounding Issue in Log and Exp Functions

Rounding issue in LOG and EXP functions

You can round to big multiple, for your data:

--720000000000000 must be multiple of 600

select
round( 719999999999998/600, 0 ) * 600

--result: 720000000000000

Test it at SQLFiddle

create TABLE T 
(
PAR_COLUMN INT,
PERIOD INT,
VALUE NUMERIC(22, 6)
)
INSERT INTO T VALUES
(1,601,10.1 ), --<--- I put decimals just to test!
(1,602,20 ),
(1,603,30 ),
(1,604,40 ),
(1,605,50 ),
(1,606,60 ),
(2,601,100),
(2,602,200),
(2,603,300),
(2,604,400),
(2,605,500),
(2,606,600)

Query 1:

with T1 as (
SELECT *,
Exp(Sum(Log(Abs(NULLIF(VALUE, 0))))
OVER(
PARTITION BY PAR_COLUMN
ORDER BY PERIOD)) AS CUM_MUL,
VALUE AS CUM_MAX1,
LAG( VALUE , 1, 1.)
OVER(
PARTITION BY PAR_COLUMN
ORDER BY PERIOD ) AS CUM_MAX2,
LAG( VALUE , 2, 1.)
OVER(
PARTITION BY PAR_COLUMN
ORDER BY PERIOD ) AS CUM_MAX3
FROM T )
select PAR_COLUMN, PERIOD, VALUE,
( round( ( CUM_MUL / ( CUM_MAX1 * CUM_MAX2 * CUM_MAX3) ) ,6)
*
cast( ( 1000000 * CUM_MAX1 * CUM_MAX2 * CUM_MAX3) as bigint )
) / 1000000.
as CUM_MUL
FROM T1

Results:

| PAR_COLUMN | PERIOD | VALUE |         CUM_MUL |
|------------|--------|-------|-----------------|
| 1 | 601 | 10.1 | 10.1 | --ok! because my data
| 1 | 602 | 20 | 202 |
| 1 | 603 | 30 | 6060 |
| 1 | 604 | 40 | 242400 |
| 1 | 605 | 50 | 12120000 |
| 1 | 606 | 60 | 727200000 |
| 2 | 601 | 100 | 100 |
| 2 | 602 | 200 | 20000 |
| 2 | 603 | 300 | 6000000 |
| 2 | 604 | 400 | 2400000000 |
| 2 | 605 | 500 | 1200000000000 |
| 2 | 606 | 600 | 720000000000000 |

Notice I x1000000 to work without decimals

I do *not* want correct rounding for function exp

To answer the general question on why the library functions are required to give correctly rounded results:

Floating-point is hard, and often times counterintuitive. Not every programmer has read what they should have. When libraries used to allow some slightly inaccurate rounding, people complained about the precision of the library function when their inaccurate computations inevitably went wrong and produced nonsense. In response, the library writers made their libraries exactly rounded, so now people cannot shift the blame to them.

In many cases, specific knowledge about floating point algorithms can produce considerable improvements to accuracy and/or performance, like in the testcase:

Taking the exp() of numbers very close to 0 in floating-point numbers is problematic, since the result is a number close to 1 while all the precision is in the difference to one, so most significant digits are lost. It is more precise (and significantly faster in this testcase) to compute exp(x) - 1 through the C math library function expm1(x). If the exp() itself is really needed, it is still much faster to do expm1(x) + 1.

A similar concern exists for computing log(1 + x), for which there is the function log1p(x).

A quick fix that speeds up the provided testcase:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <time.h>

int main (void)
{
int i;
double a, c;
c = 0;
clock_t start = clock ();
for (i = 0; i < 1e6; ++i) // Doing a large number of times the same type of computation with different values, to smoothen random fluctuations.
{
a = (double) (1 + 2 * (rand () % 0x400)) / 0x20000000000000; // "a" has only a few significant digits, and its last non-zero digit is at (fixed-point) position 53.
c += expm1 (a) + 1; // replace exp() with expm1() + 1
}
clock_t stop = clock ();
printf ("%e\n", c); // Just to be sure that the compiler will actually perform the computation.
printf ("Clock time spent: %d\n", stop - start);
return 0;
}

For this case, the timings on my machine are thus:

Original code

1.000000e+06

Clock time spent: 21543338

Modified code

1.000000e+06

Clock time spent: 55076

Programmers with advanced knowledge about the accompanying trade-offs may sometimes consider using approximate results where the precision is not critical

For an experienced programmer it may be possible to write an approximative implementation of a slow function using methods like Newton-Raphson, Taylor or Maclaurin polynomials, specifically inexactly rounded specialty functions from libraries like Intel's MKL, AMD's AMCL, relaxing the floating-point standard compliance of the compiler, reducing precision to ieee754 binary32 (float), or a combination of these.

Note that a better description of the problem would enable a better answer.

TSQL number rounding issue

In the expression FLOOR(5.7456 * 1000 + 0.4);, the part between parentheses is evaluated first. For constants the data types are inferred based on the notation; for 5.7456 that is decimal(5,4); 1000 is an int; and 0.4 is decimal(1,1). The inferred data type for 5.7456 * 1000 is then decimal(10,4); and for the full expression it is decimal(11,4). These are all exact numeric data types so you will not experience any rounding; the end result is 5746.0000 exactly. The FLOOR function trims the fraction and converts to decimal(11,0), returning 5746.

In the user-defined function, you store input parameters and intermediate results in float data type (floating point data). This data type is intended to be used for approximate data, such as measurements, where the data you read from the intstrument is already an approximation. I have learned in high school to read as many digits as I can, but treat the last one as insignificant - I had to keep it in all computations, but round the final result to the number of significant digits based on the accuracy of my measurements. The rounding ensures that inaccuracies in the last digits will not affect the end result.
Floating point data types should be treated in the same way.

Internally, floating point digits are represented in a base-2 number system. This means that there are numbers that have an exact representation in our commonly used base-10 system (such as 5.7456), but a never ending fractional part in base-2. (Similar to how for instance one third, which can be represented exactly in base-3, has a never ending fractional part in base-10: 0.33333333333(etc)). The number of base-2 digits used for storage of a float number is finite, so it has to be cut off at the end - which results in it being rounded either up or down by a tiny fraction. You can see this if you run:

DECLARE @a float = 5.7456;
SELECT CAST(@a AS decimal(19,16));

In this case, the effect of cutting off after a lot of base-2 digits is that the value stored is 0.0000000000000004 less than the decimal value you put in. That small difference turns into a huge effect because of the FLOOR function, which does exactly what it should do: round down to the nearest integer.

(I've seen a lot of people call this an error. It is not. It is intended and documented behavior. And the precision loss here is neither worse nor better than the precision loss you get when you store a third in a DECIMAL(7,6); it is just a bit less obvious because we have all grown up being used to working in base-10)

Rounding Error: Harmonic mean with exponent of small numbers

The best approach is probably to keep things in log scale. Many scientific languages have a log-add-exp function (e.g. numpy.logaddexp in python) that does what you want to high precision, with both the input and the result in log form.

The idea is that you want to compute e^-1000 + e^-1001 + e^-1002, so you factor it to e^-1000 (1 + + e^-1 + e^-2) and take the log. The result is -1000 + log(1 + e^-1 + e^-2), which can be computed without loss of precision.

Different Result with ROUND Function in MySQL

As per the comment of @Alvaro. I changed the data type of column from DOUBLE to DECIMAL. And the issue is resolved. Strange but true.
Now the ROUND function is bringing in the correct results.

Inaccurate Logarithm in Python

This is to be expected with computer arithmetic. It is following particular rules, such as IEEE 754, that probably don't match the math you learned in school.

If this actually matters, use Python's decimal type.

Example:

from decimal import Decimal, Context
ctx = Context(prec=20)
two = Decimal(2)
ctx.divide(ctx.power(two, Decimal(31)).ln(ctx), two.ln(ctx))


Related Topics



Leave a reply



Submit