﻿ Ruby Floating Point Errors - ITCodar

# Ruby Floating Point Errors

## ruby floating point errors

If you do the calculations by hand in double-precision binary, which is limited to 53 significant bits, you'll see what's going on:

129.95 = 1.0000001111100110011001100110011001100110011001100110 x 2^7

129.95*100 = 1.1001011000010111111111111111111111111111111111111111011 x 2^13

This is 56 significant bits long, so rounded to 53 bits it's

1.1001011000010111111111111111111111111111111111111111 x 2^13, which equals

12994.999999999998181010596454143524169921875

Now 129.95*10 = 1.01000100110111111111111111111111111111111111111111111 x 2^10

This is 54 significant bits long, so rounded to 53 bits it's 1.01000100111 x 2^10 = 1299.5

Now 1299.5 * 10 = 1.1001011000011 x 2^13 = 12995.

## How to prevent Ruby money floating point errors

You should use decimals for money amounts. See http://ruby-doc.org/stdlib-2.1.1/libdoc/bigdecimal/rdoc/BigDecimal.html for instance. It has arbitrary precision arithmetic.

EDIT: In your case you should probably change your Rspec to something like:

it "correctly manipulates simple money calculations" do
# Money.infinite_precision = false or true i've tried both
start_val = Money.new("1000", "EUR")
thirty = BigDecimal.new("30")
expect(start_val / thirty * thirty).to eq start_val
end

EDIT2: in this very case 1000/30 cannot be represented as a finite decimal number. You have to either use Rational class or do rounding. Example code:

it "correctly manipulates simple money calculations" do
# Money.infinite_precision = false or true i've tried both
start_val = Money.new("1000", "EUR")
expect(start_val.amount.to_r / 30.to_r * 30.to_r).to eq start_val.amount.to_r
end

## ruby issue with float point iteration

Think of it this way:

Your computer only has 32 or 64 bits to represent a number. That means it can only represent a finite amount of numbers.

Now consider all the decimal values between 0 and 1. There is an infinite amount of them. How can you possibly represent all Real Numbers if your machine can't even represent all the numbers between 0 and 1?

The answer is that your machine needs to approximate decimal numbers. This is what you are seeing.

Of course there are libraries that try to overcome these limitations and make it so that you can still accurately represent decimal numbers. One such library is BigDecimal:

require 'bigdecimal'

count = BigDecimal.new("0")
while count < 1
count += 0.1
puts count.to_s('F')
end

The downfall is that these libraries are generally slower at arithmetic, because they are a software layer above the CPU doing these calculations.

## Floating point that make the calculation inaccurate

To preserve the accuracy of calculations use BigDecimal instead of Float:

scores = params[:scores].split("\r\n").map { |n| BigDecimal(n) }

BigDecimal provides support for very large or very accurate
floating point numbers.

Decimal arithmetic is also useful for general calculation, because it
provides the correct answers people expect–whereas normal binary
floating point arithmetic often introduces subtle errors because of
the conversion between base 10 and base 2.

## Why does this expression cause a floating point error?

It is not true that the significand of the floating-point format has enough bits to represent 26/65. (“Significand” is the preferred term. Significands are linear. Mantissas are logarithmic.)

The significand of a binary floating-point number is a binary integer. This integer is scaled according to the exponent. To represent 26/65, which is .4, in binary floating-point, we must represent it as an integer multiplied by a power of two. For example, an approximation to .4 is 1•2-1 = .5. A better approximation is 3•2-3=.375. Better still is 26•2-4 = .40625.

However, no matter what integer you use for the significand or what exponent you use, this format can never be exactly .4. Suppose you had .4 = f•2e, where f and e are integers. Then 2/5 = f•2e, so 2/(5f) = 2e, and then 1/(5f) = 2e-1 and 5f = 21-e. For that to be true, 5 would have to be a power of two. It is not, so you cannot have .4 = f•2e.

In IEEE-754 64-bit binary floating-point, the significand has 53 bits. With this, the closest representable value to .4 is 0.40000000000000002220446049250313080847263336181640625, which equals 3602879701896397•2-53.

Now let us look at your calculations. In a=0.05, 0.05 is converted to floating-point, which produces 0.05000000000000000277555756156289135105907917022705078125.

In a*26.0/65, a*26.0 is evaluated first. The exact mathematical result is rounded to the nearest representable value, producing 1.3000000000000000444089209850062616169452667236328125. Then this is divided by 65. Again, the answer is rounded, producing 0.0200000000000000004163336342344337026588618755340576171875. When Ruby prints this value, it apparently decides it is close enough to .02 that it can just display “.02” and not the complete value. This is reasonable in the sense that, if you convert the printed value .02 back to floating-point, you get the actual value again, 0.0200000000000000004163336342344337026588618755340576171875. So “.02” is in some sense a good representative for 0.0200000000000000004163336342344337026588618755340576171875.

In your alternative expression, you have a*=26.0/65. In this, 26.0/65 is evaluated first. This produces 0.40000000000000002220446049250313080847263336181640625. This is different from the first expression because you have performed the operations in a different order, so a different number was rounded. It may have happened that a value in the first expression was rounded down whereas this different value, because of where it happened to land relative to values representable in floating-point, rounded up.

Then the value is multiplied by a. This produces 0.02000000000000000388578058618804789148271083831787109375. Note that this value is further from .02 than the result of the first expression. Your implementation of Ruby knows this, so it determines that printing “.02” is not enough to represent it accurately. Instead, it displays more digits, showing 0.020000000000000004.

## How does Enumerable#sum avoid floating point rounding errors?

For floating point values Enumerable#sum uses an algorithm that compensates for the accumulation of error as the summation progresses.

As mentioned in the comment, the source code links to this paper and Wikipedia has an article on a variation of the described algorithm known as the Kahan Summation Algorithm.

## floating point error in Ruby matrix calculation

No, this is not troubling. That matrix likely just doesn't work well with that particular eigenvector algorithm implementation. Efficient and stable general eigenvector computation is nontrivial, after all.

The Matrix library is adapted from JAMA, a Java matrix package, which says it does a numerical computation and not a symbolic computation:

Not Covered. JAMA is by no means a complete linear algebra environment ... it focuses on the principle mathematical functionality required to do numerical linear algebra

### The QR Algorithm: Numerical Computation

Looking at the source code for Matrix::EigenvalueDecomposition, I've found that it names the usage of the QR algorithm. I don't fully understand the intricacies of the mathematics, but I think I might understand why this computation fails. The mechanism of computation works as stated:

At the k-th step (starting with k = 0), we compute the QR decomposition Ak=QkRk ... Under certain conditions,[4] the matrices Ak converge to a triangular matrix, the Schur form of A. The eigenvalues of a triangular matrix are listed on the diagonal, and the eigenvalue problem is solved.

In "pseudo" Ruby, this conceptually means:

working_matrix = orig_matrix.dup
all_q_matrices = []
loop do
q, r = working_matrix.qr_decomposition
all_q_matrices << q
next_matrix = r * q
break if difference_between(working_matrix, next_matrix) < accuracy_threshold
end
eigenvalues = working_matrix.diagonal_values

For eigenvectors, it continues:

upon convergence, AQ = QΛ, where Λ is the diagonal matrix of eigenvalues to which A converged, and where Q is a composite of all the orthogonal similarity transforms required to get there. Thus the columns of Q are the eigenvectors.

In "pseudo" Ruby, continued:

eigenvectors = all_q_matrices.inject(:*).columns

### Floating Point Error in Numerical Computations

We can see that an iteration of numerical computations are made to compute the approximate eigenvalues, and as a side-effect, a bunch of approximate Q matrices are collected. Then, these approximated Q matrices are composed together to form the eigenvectors.

The compounding of approximations is what probably caused the wildly inaccurate results. An example of catastrophic cancellation on Math StackExchange shows a simple quadratic computation with 400% relative error. You might be able to imagine how an iterative matrix algorithm with repeated arithmetic operations could do much worse.

A Grain of Salt

Again, I don't have a deep understanding of the mathematics of the algorithm nor the implementation, so I don't know precisely what parts of the computation caused your specific 85110032990182200% error, but I hope you now can understand how it may have happened.

## floating point error in Ruby's BigDecimal class?

BigDecimal(1) / BigDecimal(3) * BigDecimal(3)
# => #<BigDecimal:19289d8,'0.9999999999 99999999E0',18(36)>

How did it get there?

BigDecimal(1) / BigDecimal(3)
# => #<BigDecimal:1921a70,'0.3333333333 33333333E0',18(36)>

BigDecimal does not provide rational numbers, so when you divide 1 by 3, you get 0, following by a lot of 3s. A lot, but not infinitely many. When you then multiply that by 3, you will get 0 followed by equally many 9s.

I believe you misread the BigDecimal's advertisement (although I am not sure it is anywhere advertised as the solution to floating point errors). It just provides arbitrary precision. It is still a floating point number. If you really want exact numbers when dividing numbers, you might take a look at Rational class:

(Rational(50) * Rational(0.6) / Rational(360) * Rational(33)).to_f
# => 2.75

## Ruby Floating Point Math - Issue with Precision in Sum Calc

If accuracy is important to you, you should not be using floating point values, which, by definition, are not accurate. Ruby has some precision data types for doing arithmetic where accuracy is important. They are, off the top of my head, BigDecimal, Rational and Complex, depending on what you actually need to calculate.

It seems that in your case, what you're looking for is BigDecimal, which is basically a number with a fixed number of digits, of which there are a fixed number of digits after the decimal point (in contrast to a floating point, which has an arbitrary number of digits after the decimal point).

When you read from Excel and deliberately cast those strings like "0.9987" to floating points, you're immediately losing the accurate value that is contained in the string.

require "bigdecimal"
BigDecimal("0.9987")

That value is precise. It is 0.9987. Not 0.998732109, or anything close to it, but 0.9987. You may use all the usual arithmetic operations on it. Provided you don't mix floating points into the arithmetic operations, the return values will remain precise.

If your array contains the raw strings you got from Excel (i.e. you haven't #to_f'd them), then this will give you a BigDecimal that is the difference between the sum of them and 1.

1 - array.map{|v| BigDecimal(v)}.reduce(:+)