Arbitrary Precision Arithmetic with Ruby

Arbitrary precision arithmetic with Ruby

Simple: it does it the same way you do, ever since first grade. Except it doesn't compute in base 10, it computes in base 4 billion (and change).

Think about it: with our number system, we can only represent numbers from 0 to 9. So, how can we compute 6+7 without overflowing? Easy: we do actually overflow! We cannot represent the result of 6+7 as a number between 0 and 9, but we can overflow to the next place and represent it as two numbers between 0 and 9: 3×100 + 1×101. If you want to add two numbers, you add them digit-wise from the right and overflow ("carry") to the left. If you want to multiply two numbers, you have to multiply every digit of one number individually with the other number, then add up the intermediate results.

BigNum arithmetic (this is what this kind of arithmetic where the numbers are bigger than the native machine numbers is usually called) works basically the same way. Except that the base is not 10, and its not 2, either – it's the size of a native machine integer. So, on a 32 bit machine, it would be base 232 or 4 294 967 296.

Specifically, in Ruby Integer is actually an abstract class that is never instianted. Instead, it has two subclasses, Fixnum and Bignum, and numbers automagically migrate between them, depending on their size. In MRI and YARV, Fixnum can hold a 31 or 63 bit signed integer (one bit is used for tagging) depending on the native word size of the machine. In JRuby, a Fixnum can hold a full 64 bit signed integer, even on an 32 bit machine.

The simplest operation is adding two numbers. And if you look at the implementation of + or rather bigadd_core in YARV's bignum.c, it's not too bad to follow. I can't read C either, but you can cleary see how it loops over the individual digits.

Arbitrary-precision arithmetic Explanation

It's all a matter of adequate storage and algorithms to treat numbers as smaller parts. Let's assume you have a compiler in which an int can only be 0 through 99 and you want to handle numbers up to 999999 (we'll only worry about positive numbers here to keep it simple).

You do that by giving each number three ints and using the same rules you (should have) learned back in primary school for addition, subtraction and the other basic operations.

In an arbitrary precision library, there's no fixed limit on the number of base types used to represent our numbers, just whatever memory can hold.

Addition for example: 123456 + 78:

12 34 56
78
-- -- --
12 35 34

Working from the least significant end:

  • initial carry = 0.
  • 56 + 78 + 0 carry = 134 = 34 with 1 carry
  • 34 + 00 + 1 carry = 35 = 35 with 0 carry
  • 12 + 00 + 0 carry = 12 = 12 with 0 carry

This is, in fact, how addition generally works at the bit level inside your CPU.

Subtraction is similar (using subtraction of the base type and borrow instead of carry), multiplication can be done with repeated additions (very slow) or cross-products (faster) and division is trickier but can be done by shifting and subtraction of the numbers involved (the long division you would have learned as a kid).

I've actually written libraries to do this sort of stuff using the maximum powers of ten that can be fit into an integer when squared (to prevent overflow when multiplying two ints together, such as a 16-bit int being limited to 0 through 99 to generate 9,801 (<32,768) when squared, or 32-bit int using 0 through 9,999 to generate 99,980,001 (<2,147,483,648)) which greatly eased the algorithms.

Some tricks to watch out for.

1/ When adding or multiplying numbers, pre-allocate the maximum space needed then reduce later if you find it's too much. For example, adding two 100-"digit" (where digit is an int) numbers will never give you more than 101 digits. Multiply a 12-digit number by a 3 digit number will never generate more than 15 digits (add the digit counts).

2/ For added speed, normalise (reduce the storage required for) the numbers only if absolutely necessary - my library had this as a separate call so the user can decide between speed and storage concerns.

3/ Addition of a positive and negative number is subtraction, and subtracting a negative number is the same as adding the equivalent positive. You can save quite a bit of code by having the add and subtract methods call each other after adjusting signs.

4/ Avoid subtracting big numbers from small ones since you invariably end up with numbers like:

         10
11-
-- -- -- --
99 99 99 99 (and you still have a borrow).

Instead, subtract 10 from 11, then negate it:

11
10-
--
1 (then negate to get -1).

Here are the comments (turned into text) from one of the libraries I had to do this for. The code itself is, unfortunately, copyrighted, but you may be able to pick out enough information to handle the four basic operations. Assume in the following that -a and -b represent negative numbers and a and b are zero or positive numbers.

For addition, if signs are different, use subtraction of the negation:

-a +  b becomes b - a
a + -b becomes a - b

For subtraction, if signs are different, use addition of the negation:

 a - -b becomes   a + b
-a - b becomes -(a + b)

Also special handling to ensure we're subtracting small numbers from large:

small - big becomes -(big - small)

Multiplication uses entry-level math as follows:

475(a) x 32(b) = 475 x (30 + 2)
= 475 x 30 + 475 x 2
= 4750 x 3 + 475 x 2
= 4750 + 4750 + 4750 + 475 + 475

The way in which this is achieved involves extracting each of the digits of 32 one at a time (backwards) then using add to calculate a value to be added to the result (initially zero).

ShiftLeft and ShiftRight operations are used to quickly multiply or divide a LongInt by the wrap value (10 for "real" math). In the example above, we add 475 to zero 2 times (the last digit of 32) to get 950 (result = 0 + 950 = 950).

Then we left shift 475 to get 4750 and right shift 32 to get 3. Add 4750 to zero 3 times to get 14250 then add to result of 950 to get 15200.

Left shift 4750 to get 47500, right shift 3 to get 0. Since the right shifted 32 is now zero, we're finished and, in fact 475 x 32 does equal 15200.

Division is also tricky but based on early arithmetic (the "gazinta" method for "goes into"). Consider the following long division for 12345 / 27:

       457
+-------
27 | 12345 27 is larger than 1 or 12 so we first use 123.
108 27 goes into 123 4 times, 4 x 27 = 108, 123 - 108 = 15.
---
154 Bring down 4.
135 27 goes into 154 5 times, 5 x 27 = 135, 154 - 135 = 19.
---
195 Bring down 5.
189 27 goes into 195 7 times, 7 x 27 = 189, 195 - 189 = 6.
---
6 Nothing more to bring down, so stop.

Therefore 12345 / 27 is 457 with remainder 6. Verify:

  457 x 27 + 6
= 12339 + 6
= 12345

This is implemented by using a draw-down variable (initially zero) to bring down the segments of 12345 one at a time until it's greater or equal to 27.

Then we simply subtract 27 from that until we get below 27 - the number of subtractions is the segment added to the top line.

When there are no more segments to bring down, we have our result.


Keep in mind these are pretty basic algorithms. There are far better ways to do complex arithmetic if your numbers are going to be particularly large. You can look into something like GNU Multiple Precision Arithmetic Library - it's substantially better and faster than my own libraries.

It does have the rather unfortunate misfeature in that it will simply exit if it runs out of memory (a rather fatal flaw for a general purpose library in my opinion) but, if you can look past that, it's pretty good at what it does.

If you cannot use it for licensing reasons (or because you don't want your application just exiting for no apparent reason), you could at least get the algorithms from there for integrating into your own code.

I've also found that the bods over at MPIR (a fork of GMP) are more amenable to discussions on potential changes - they seem a more developer-friendly bunch.

Issue with precision of Ruby math operations

Big Decimal


As the man said;

Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation.


I have however had great success using the BigDecimal class. To quote its intro

Ruby provides built-in support for arbitrary precision integer arithmetic. For example:

42**13 -> 1265437718438866624512

BigDecimal provides similar support for very large or very accurate floating point numbers.


Taking one of your examples;

>> x = BigDecimal.new('900.1')
=> #<BigDecimal:101113be8,'0.9001E3',8(8)>
>> x % 1
=> #<BigDecimal:10110b498,'0.1E0',4(16)>
>> y = x % 1
=> #<BigDecimal:101104760,'0.1E0',4(16)>
>> y.to_s
=> "0.1E0"
>> y.to_f
=> 0.1


As you can see, ensuring decent precision is possible but it requires a little bit of effort.

How does an environment (e.g. Ruby) handle massive integers?

Python also does this.

Basically, instead of treating a number as a string of bits that naturally fits the hardware architecture, (32 bits for instance) it treats a number as a string of 32-bit digits, then implements all the arithmetic operations to handle carries from one 32-bit digit to another. That also involves allocating additional 32 bit digits as the number grows longer. This is easier than it seems.

For instance, 99 * 99 is less that 100 * 100 which is 10,000, therefore one might assume that multiplying two 2-digit numbers will produce a result no longer than 4 digits. Same thing applies when each digit is a 32-bit word.

You might want to try implementing this in Ruby, just for fun, using some type that allows fixed binary quantities. I believe the FixNum class would work.

Display BigDecimal To Arbitrary Precision

I think the problem is that the BigDecimal objects don't have their precision set to a value high enough. I can get a 1000 fractional digits printed if I explicitly specify the required precision of the operation by using div instead of /:

require 'bigdecimal'

puts (BigDecimal.new(1).div(BigDecimal.new(3), 1000)).to_s

#=> 0.3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333E0

After that, you can limit the number of fractional digits with round.

Arbitrary precision arithmetic with very big factorials

As other answers have shown, the recursion is easily removed. Now the question is: can you store the result in a BigInteger, or are you going to have to go to some sort of external storage?

The number of bits you need to store n! is roughly proportional to n log n. (This is a weak form of Stirling's Approximation.) So let's look at some sizes: (Note that I made some arithmetic errors in an earlier version of this post, which I am correcting here.)

(10^6)! takes order of 2 x 10^6 bytes = a few megabytes
(10^12)! takes order of 3 x 10^12 bytes = a few terabytes
(10^21)! takes order of 10^22 bytes = ten billion terabytes

A few megs will fit into memory. A few terabytes is easily within your grasp but you'll need to write a memory manager probably. Ten billion terabytes will take the combined resources of all the technology companies in the world, but it is doable.

Now consider the computation time. Suppose we can perform a million multiplications per second per machine and that we can parallelize the work out to multiple machines somehow.

(10^6)! takes order of one second on one machine
(10^12)! takes order of 10^6 seconds on one machine =
10 days on one machine =
a few minutes on a thousand machines.
(10^21)! takes order of 10^15 seconds on one machine =
30 million years on one machine =
3 years on 10 million machines
1 day on 10 billion machines (each with a TB drive.)

So (10^6)! is within your grasp. (10^12)! you are going to have to write your own memory manager and math library, and it will take you some time to get an answer. (10^21)! you will need to organize all the resources of the world to solve this problem, but it is doable.

Or you could find another approach.

Why does Ruby have separate data types for Fixnums and Bignums, but not Strings and really long strings?

Very crudely, Fixnums represent numbers that the computer can store in a processor register and perform basic arithmetic on. Bignums are necessary because they represent numbers that by definition can not be stored in single registers, and therefore require "special" processing called arbitrary-precision arithmetic in order to perform arithmetic on them.

In computer science, arbitrary-precision arithmetic, also called
bignum arithmetic [...] indicates that calculations are performed on
numbers whose digits of precision are limited only by the available
memory of the host system. This contrasts with the faster
fixed-precision arithmetic found in most arithmetic logic unit (ALU)
hardware, which typically offers between 8 and 64 bits of precision.

With strings, there is no such distinction. An arbitrarily long string never has arithmetic performed on it at the processor instruction level, so there's no reason to distinguish between a "short" string and a "long" string.

In regards to memory:

(Small) numbers can be manipulated directly as actual binary CPU instructions because they fit entirely into a single CPU memory address (called a register.)

Neither (big) numbers nor strings can because they overflow a processor's "number of bits." (This is what's meant by a "64-bit processor": A single numeric value can be represented and manipulated up to 64-bits of precision.)

Strings are allocated in memory as a series of numbers that represent characters (with different numbers representing different characters depending on the chosen encoding, such as UTF-8 or ASCII) and they span across multiple memory locations. The strings are never fed through CPU as atomic units though, whereas Fixnums are.

Bignums likewise can't be handled in the ALU because each value (potentially) takes up more memory than what is available in a single CPU register, so a separate package that defines the necessary mathematical operations to break down very large numbers, perform arithmetic on the smaller parts using the ALU and composite the individual answers together has to be written separately.

How does Ruby store large numbers?

Ruby uses Bignum objects to store number larger than 2^64. You can see here a description of how this works:

class_diagram

On the left, you can see RBignum contains an inner structure called
RBasic, which contains internal, technical values used by all Ruby
objects. Below that I show values specific to Bignum objects: digits
and len. digits is a pointer to an array of 32-bit values that contain
the actual big integer’s bits grouped into sets of 32. len records how
many 32-bit groups are in the digits array. Since there can be any
number of groups in the digits array, Ruby can represent arbitrarily
large integers using RBignum.

Has arbitrary-precision arithmetic affected numerical analysis software?

Arbitrary precision is slow. Very slow. And the moment you use a function that produces an irrational value (such as most trig functions), you lose your arbitrary precision advantage.

So if you don't need, or can't use that precision, why spend all that CPU time on it?



Related Topics



Leave a reply



Submit