Issue with Precision of Ruby Math Operations

Issue with precision of Ruby math operations

Big Decimal


As the man said;

Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation.


I have however had great success using the BigDecimal class. To quote its intro

Ruby provides built-in support for arbitrary precision integer arithmetic. For example:

42**13 -> 1265437718438866624512

BigDecimal provides similar support for very large or very accurate floating point numbers.


Taking one of your examples;

>> x = BigDecimal.new('900.1')
=> #<BigDecimal:101113be8,'0.9001E3',8(8)>
>> x % 1
=> #<BigDecimal:10110b498,'0.1E0',4(16)>
>> y = x % 1
=> #<BigDecimal:101104760,'0.1E0',4(16)>
>> y.to_s
=> "0.1E0"
>> y.to_f
=> 0.1


As you can see, ensuring decent precision is possible but it requires a little bit of effort.

Ruby Floating Point Math - Issue with Precision in Sum Calc

If accuracy is important to you, you should not be using floating point values, which, by definition, are not accurate. Ruby has some precision data types for doing arithmetic where accuracy is important. They are, off the top of my head, BigDecimal, Rational and Complex, depending on what you actually need to calculate.

It seems that in your case, what you're looking for is BigDecimal, which is basically a number with a fixed number of digits, of which there are a fixed number of digits after the decimal point (in contrast to a floating point, which has an arbitrary number of digits after the decimal point).

When you read from Excel and deliberately cast those strings like "0.9987" to floating points, you're immediately losing the accurate value that is contained in the string.

require "bigdecimal"
BigDecimal("0.9987")

That value is precise. It is 0.9987. Not 0.998732109, or anything close to it, but 0.9987. You may use all the usual arithmetic operations on it. Provided you don't mix floating points into the arithmetic operations, the return values will remain precise.

If your array contains the raw strings you got from Excel (i.e. you haven't #to_f'd them), then this will give you a BigDecimal that is the difference between the sum of them and 1.

1 - array.map{|v| BigDecimal(v)}.reduce(:+)

Is floating point math broken?

Binary floating point math is like this. In most programming languages, it is based on the IEEE 754 standard. The crux of the problem is that numbers are represented in this format as a whole number times a power of two; rational numbers (such as 0.1, which is 1/10) whose denominator is not a power of two cannot be exactly represented.

For 0.1 in the standard binary64 format, the representation can be written exactly as

  • 0.1000000000000000055511151231257827021181583404541015625 in decimal, or
  • 0x1.999999999999ap-4 in C99 hexfloat notation.

In contrast, the rational number 0.1, which is 1/10, can be written exactly as

  • 0.1 in decimal, or
  • 0x1.99999999999999...p-4 in an analogue of C99 hexfloat notation, where the ... represents an unending sequence of 9's.

The constants 0.2 and 0.3 in your program will also be approximations to their true values. It happens that the closest double to 0.2 is larger than the rational number 0.2 but that the closest double to 0.3 is smaller than the rational number 0.3. The sum of 0.1 and 0.2 winds up being larger than the rational number 0.3 and hence disagreeing with the constant in your code.

A fairly comprehensive treatment of floating-point arithmetic issues is What Every Computer Scientist Should Know About Floating-Point Arithmetic. For an easier-to-digest explanation, see floating-point-gui.de.

Side Note: All positional (base-N) number systems share this problem with precision

Plain old decimal (base 10) numbers have the same issues, which is why numbers like 1/3 end up as 0.333333333...

You've just stumbled on a number (3/10) that happens to be easy to represent with the decimal system, but doesn't fit the binary system. It goes both ways (to some small degree) as well: 1/16 is an ugly number in decimal (0.0625), but in binary it looks as neat as a 10,000th does in decimal (0.0001)** - if we were in the habit of using a base-2 number system in our daily lives, you'd even look at that number and instinctively understand you could arrive there by halving something, halving it again, and again and again.

Of course, that's not exactly how floating-point numbers are stored in memory (they use a form of scientific notation). However, it does illustrate the point that binary floating-point precision errors tend to crop up because the "real world" numbers we are usually interested in working with are so often powers of ten - but only because we use a decimal number system day-to-day. This is also why we'll say things like 71% instead of "5 out of every 7" (71% is an approximation, since 5/7 can't be represented exactly with any decimal number).

So no: binary floating point numbers are not broken, they just happen to be as imperfect as every other base-N number system :)

Side Side Note: Working with Floats in Programming

In practice, this problem of precision means you need to use rounding functions to round your floating point numbers off to however many decimal places you're interested in before you display them.

You also need to replace equality tests with comparisons that allow some amount of tolerance, which means:

Do not do if (x == y) { ... }

Instead do if (abs(x - y) < myToleranceValue) { ... }.

where abs is the absolute value. myToleranceValue needs to be chosen for your particular application - and it will have a lot to do with how much "wiggle room" you are prepared to allow, and what the largest number you are going to be comparing may be (due to loss of precision issues). Beware of "epsilon" style constants in your language of choice. These are not to be used as tolerance values.

ruby issue with float point iteration

Think of it this way:

Your computer only has 32 or 64 bits to represent a number. That means it can only represent a finite amount of numbers.

Now consider all the decimal values between 0 and 1. There is an infinite amount of them. How can you possibly represent all Real Numbers if your machine can't even represent all the numbers between 0 and 1?

The answer is that your machine needs to approximate decimal numbers. This is what you are seeing.

Of course there are libraries that try to overcome these limitations and make it so that you can still accurately represent decimal numbers. One such library is BigDecimal:

require 'bigdecimal'

count = BigDecimal.new("0")
while count < 1
count += 0.1
puts count.to_s('F')
end

The downfall is that these libraries are generally slower at arithmetic, because they are a software layer above the CPU doing these calculations.

Unable to understand the subtraction result

Floating-point numbers cannot precisely represent all real numbers, and floating-point operations cannot precisely represent true arithmetic operations, this leads to many surprising situations.

I advise to read: https://en.wikipedia.org/wiki/Floating_point#Accuracy_problems

The right way to handle this in Ruby is to use the BigDecimal class

> require 'bigdecimal'
true

> a = BigDecimal.new('3.3')
3.3

> b = BigDecimal.new('2.7')
2.7

> c = BigDecimal.new('0.6')
0.6

> a - b == c
true

I need help understanding ruby's floating point precision

This is because Ruby by default uses Double-precision floating-point format. You can read about issues related to it here. However here's a short and crisp answer:

Because internally, computers use a format (binary floating-point)
that cannot accurately represent a number like 0.1, 0.2 or 0.3 at all.

When the code is compiled or interpreted, your “0.1” is already
rounded to the nearest number in that format, which results in a small
rounding error even before the calculation happens.

Source: http://floating-point-gui.de/

Arbitrary precision arithmetic with Ruby

Simple: it does it the same way you do, ever since first grade. Except it doesn't compute in base 10, it computes in base 4 billion (and change).

Think about it: with our number system, we can only represent numbers from 0 to 9. So, how can we compute 6+7 without overflowing? Easy: we do actually overflow! We cannot represent the result of 6+7 as a number between 0 and 9, but we can overflow to the next place and represent it as two numbers between 0 and 9: 3×100 + 1×101. If you want to add two numbers, you add them digit-wise from the right and overflow ("carry") to the left. If you want to multiply two numbers, you have to multiply every digit of one number individually with the other number, then add up the intermediate results.

BigNum arithmetic (this is what this kind of arithmetic where the numbers are bigger than the native machine numbers is usually called) works basically the same way. Except that the base is not 10, and its not 2, either – it's the size of a native machine integer. So, on a 32 bit machine, it would be base 232 or 4 294 967 296.

Specifically, in Ruby Integer is actually an abstract class that is never instianted. Instead, it has two subclasses, Fixnum and Bignum, and numbers automagically migrate between them, depending on their size. In MRI and YARV, Fixnum can hold a 31 or 63 bit signed integer (one bit is used for tagging) depending on the native word size of the machine. In JRuby, a Fixnum can hold a full 64 bit signed integer, even on an 32 bit machine.

The simplest operation is adding two numbers. And if you look at the implementation of + or rather bigadd_core in YARV's bignum.c, it's not too bad to follow. I can't read C either, but you can cleary see how it loops over the individual digits.

Why is division in Ruby returning an integer instead of decimal value?

It’s doing integer division. You can make one of the numbers a Float by adding .0:

9.0 / 5  #=> 1.8
9 / 5.0 #=> 1.8


Related Topics



Leave a reply



Submit