Why Does Changing the Sum Order Returns a Different Result

Why does changing the sum order returns a different result?

Maybe this question is stupid, but why does simply changing the order of the elements affects the result?

It will change the points at which the values are rounded, based on their magnitude. As an example of the kind of thing that we're seeing, let's pretend that instead of binary floating point, we were using a decimal floating point type with 4 significant digits, where each addition is performed at "infinite" precision and then rounded to the nearest representable number. Here are two sums:

1/3 + 2/3 + 2/3 = (0.3333 + 0.6667) + 0.6667
= 1.000 + 0.6667 (no rounding needed!)
= 1.667 (where 1.6667 is rounded to 1.667)

2/3 + 2/3 + 1/3 = (0.6667 + 0.6667) + 0.3333
= 1.333 + 0.3333 (where 1.3334 is rounded to 1.333)
= 1.666 (where 1.6663 is rounded to 1.666)

We don't even need non-integers for this to be a problem:

10000 + 1 - 10000 = (10000 + 1) - 10000
= 10000 - 10000 (where 10001 is rounded to 10000)
= 0

10000 - 10000 + 1 = (10000 - 10000) + 1
= 0 + 1
= 1

This demonstrates possibly more clearly that the important part is that we have a limited number of significant digits - not a limited number of decimal places. If we could always keep the same number of decimal places, then with addition and subtraction at least, we'd be fine (so long as the values didn't overflow). The problem is that when you get to bigger numbers, smaller information is lost - the 10001 being rounded to 10000 in this case. (This is an example of the problem that Eric Lippert noted in his answer.)

It's important to note that the values on the first line of the right hand side are the same in all cases - so although it's important to understand that your decimal numbers (23.53, 5.88, 17.64) won't be represented exactly as double values, that's only a problem because of the problems shown above.

Different results when adding same doubles in different order

Floating point numbers lose precision as you do more operations. Generally, you get the highest precision by adding the smallest numbers first. (So the result does depend on the order of operations)

In addition to maintaining the same order of operations, you'll also have to use strictfp to get the same result on different platforms.

Or better yet, don't use floating points: use a BigDecimal instead.

Why does the sum of float numbers with exact values still depend on order?

Sums Are Not Exact

Your assumption that the sum of exact values is exact is wrong.

Floating-point arithmetic uses some number of digits that is fixed for the format (such as 24 binary digits for float). The mathematical sum of two 24-digit numbers may have 25 digits and therefore requires rounding to represent within 24 digits (and an exponent).

Additionally, when two numbers with different exponents are added, the digits of one are offset relative to the other. The sum may have additional digits due to the offset, and again must be rounded.

When you add the numbers in different orders, the resulting roundings may be different.

Examples of Inexact Sums

These examples use three-digit binary significands.

In this example, the addition carries into a new column:


1.10 • 23
1.01 • 23
――――――――――
10.11 • 23 Exact sum, too many digits, must be rounded.
11.0 • 23 Sum rounded to three digits.
1.10 • 24 Rounded sum, exponent adjusted to normalize significand.

In this example, the numbers have different exponents, and adjusting for this shifts the digits into new columns:


1.11 • 23
1.01 • 25 Different exponent requires adjustment.

0.0111 • 25 Adjusted to match exponent.
1.01 • 25
――――――――――――
1.1011 • 25 Exact sum, too many digits, must be rounded.
1.11 • 25 Rounded sum.

Examples of Non-Associative Sums

Now we can look at adding three numbers in different ways and see that different sums are produced.

We will compare (1.10•20 + 1.10•20) + 1.00•24) to 1.10•20 + (1.10•20 + 1.00•24).

For the first express, we add the first and second operands, then the third:


Add first and second operands:
1.10 • 20 First operand.
1.10 • 20 Second operand.
――――――――――
11.00 • 20 Exact sum, too many digits, must be rounded.
11.0 • 20 Rounded sum, must be normalized.
1.10 • 21 Normalized, rounded sum.

Add previous result and third operand:
1.10 • 21 Previous result.
1.00 • 24 Third operand.

Exponents do not match, so adjust and then add:
0.00110 • 24 Previous result adjusted to match exponent.
1.00 • 24 Third operand.
――――――――――――
1.00110 • 24 Exact sum, too many digits, must be rounded.
1.01 • 24 Rounded sum.

For the second expression, we add the second and third operands, then the first:


Add second and third:
1.10 • 20 Second operand.
1.00 • 24 Third operand.

Exponents do not match, so adjust, then add:
0.000110 • 24 Second operand adjusted to match exponent.
1.00 • 24 Third operand.
――――――――――――――
1.000110 • 24 Exact sum, too many digits, must be rounded.
1.00 • 24 Rounded sum.

Add first operand and previous result:
1.10 • 20 First operand.
1.00 • 24 Previous result.

Exponents do not match, so adjust and then add:
0.000110 • 24 First operand adjusted to match exponent.
1.00 • 24 Previous result.
―――――――――――――
1.000110 • 24 Exact sum, too many digits, must be rounded.
1.00 • 24 Rounded sum.

The first expression yields 1.01•24, while the second expression yields 1.00•24. So the order in which operands are added affects the result.

In which order should floats be added to get the most precise result?

Your instinct is basically right, sorting in ascending order (of magnitude) usually improves things somewhat. Consider the case where we're adding single-precision (32 bit) floats, and there are 1 billion values equal to 1 / (1 billion), and one value equal to 1. If the 1 comes first, then the sum will come to 1, since 1 + (1 / 1 billion) is 1 due to loss of precision. Each addition has no effect at all on the total.

If the small values come first, they will at least sum to something, although even then I have 2^30 of them, whereas after 2^25 or so I'm back in the situation where each one individually isn't affecting the total any more. So I'm still going to need more tricks.

That's an extreme case, but in general adding two values of similar magnitude is more accurate than adding two values of very different magnitudes, since you "discard" fewer bits of precision in the smaller value that way. By sorting the numbers, you group values of similar magnitude together, and by adding them in ascending order you give the small values a "chance" of cumulatively reaching the magnitude of the bigger numbers.

Still, if negative numbers are involved it's easy to "outwit" this approach. Consider three values to sum, {1, -1, 1 billionth}. The arithmetically correct sum is 1 billionth, but if my first addition involves the tiny value then my final sum will be 0. Of the 6 possible orders, only 2 are "correct" - {1, -1, 1 billionth} and {-1, 1, 1 billionth}. All 6 orders give results that are accurate at the scale of the largest-magnitude value in the input (0.0000001% out), but for 4 of them the result is inaccurate at the scale of the true solution (100% out). The particular problem you're solving will tell you whether the former is good enough or not.

In fact, you can play a lot more tricks than just adding them in sorted order. If you have lots of very small values, a middle number of middling values, and a small number of large values, then it might be most accurate to first add up all the small ones, then separately total the middling ones, add those two totals together then add the large ones. It's not at all trivial to find the most accurate combination of floating-point additions, but to cope with really bad cases you can keep a whole array of running totals at different magnitudes, add each new value to the total that best matches its magnitude, and when a running total starts to get too big for its magnitude, add it into the next total up and start a new one. Taken to its logical extreme, this process is equivalent to performing the sum in an arbitrary-precision type (so you'd do that). But given the simplistic choice of adding in ascending or descending order of magnitude, ascending is the better bet.

It does have some relation to real-world programming, since there are some cases where your calculation can go very badly wrong if you accidentally chop off a "heavy" tail consisting of a large number of values each of which is too small to individually affect the sum, or if you throw away too much precision from a lot of small values that individually only affect the last few bits of the sum. In cases where the tail is negligible anyway you probably don't care. For example if you're only adding together a small number of values in the first place and you're only using a few significant figures of the sum.

Why join() method returns different result than expected

it puts the a between each element of your array, not after each one, so 6 elements has 5 joiners.

on this fiddle you can see a bit more exactly what the join is doing:
http://jsfiddle.net/YKhmp/

Python - Java Math operations give different results

When you do

int(math.pow(14764352724, 6))

you get a big number elevated to a power but using a floating point method, even if arguments are integers. Converting to integer loses precision (the original result is a float: 1.0358251994780843e+61)

When you do

14764352724**6

you get a big number elevated to a power using a binary power method using only integer multiplication.

So the second result is accurate, whereas the first isn't

>>> int(math.pow(14764352724,6))
10358251994780842724998096890217137953445700726699419360034816 # wrong
>>> 14764352724**6
10358251994780842575401275783021915748383652186833068257611776 # correct

Let's try a disassembly of both ** and math.pow functions:

import dis,math

def test(n):
return n ** 3

def test2(n):
return math.pow(n,3)

dis.dis(test)
dis.dis(test2)

output

  4           0 LOAD_FAST                0 (n)
3 LOAD_CONST 1 (3)
6 BINARY_POWER
7 RETURN_VALUE

7 0 LOAD_GLOBAL 0 (math)
3 LOAD_ATTR 1 (pow)
6 LOAD_FAST 0 (n)
9 LOAD_CONST 1 (3)
12 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
15 RETURN_VALUE

as you see, the functions aren't equivalent. BINARY_POWER is called in the first case. This function has a chance to perform integer multiply accurately when parameters are integer:

BINARY_POWER()

Implements TOS = TOS1 ** TOS

Binary power yields the same value as math.pow when parameters aren't all integer:

>>> 14764352724**6.0
1.0358251994780843e+61
>>> int(14764352724**6.0)
10358251994780842724998096890217137953445700726699419360034816

Note: what probably adds to the confusion is the built-in pow method, which is different from math.pow (and overridden by the latter when using from math import pow), but is equivalent to ** operator when used without modulo argument:

pow(x, y[, z])

Return x to the power y; if z is present, return x to the power y, modulo z (computed more efficiently than pow(x, y) % z). The two-argument form pow(x, y) is equivalent to using the power operator: x**y.



Related Topics



Leave a reply



Submit