Differencebetween the Float and Integer Data Type When the Size Is the Same

Why do float and int have such different maximum values even though they're the same number of bits?

Your intuition quite rightly tells you that there can be no more information content in one than the other, because they both have 32 bits. But that doesn't mean we can't use those bits to represent different values.

Suppose I invent two new datatypes, uint4 and foo4. uint4 uses 4 bits to represent an integer, in the standard binary representation, so we have

bits   value
0000 0
0001 1
0010 2
...
1111 15

But foo4 uses 4 bits to represent these values:

bits   value
0000 0
0001 42
0010 -97
0011 1
...
1110 pi
1111 e

Now foo4 has a much wider range of values than uint4, despite having the same number of bits! How? Because there are some uint4 values that can't be represented by foo4, so those 'slots' in the bit mapping are available for other values.


It is the same for int and float - they can both store values from a set of 232 values, just different sets of 232 values.

Float and Int Both 4 Bytes? How Come?

Well, here's a quick explanation:

An int and float usually take up "one-word" in memory. Today, with the shift to 64bit systems this may mean that your word is 64 bits, or 8 bytes, allowing the representation of a huge span of numbers. Or, it could still be a 32bit system meaning each word in memory takes up 4 bytes. Typically memory can be accessed on a word by word basis.

The difference between int and float is not their physical space in memory, but in the way the ALU (Arithmetic Logic Unit) behaves with the number. An int represents its directly corresponding number in binary (well, almost--it uses two's complement notation). A float on the other hand is encoded (typically in IEEE 754 standard format) to represent a number in exponential form (i.e. 2.99*10^6 is in exponential form).

Your misunderstanding I think lies in the misconception that a floating point can represent more information. While floats can represent numbers of greater magnitude, it cannot represent them with as much accuracy, because it has to account for encoding the exponent. The exponent itself could be quite a large number. So the number of significant digits you get out of a floating point number is less (which means less information is represented) and whereas ints represent a range of integers, the magnitude of numbers they represent is much smaller.

What is the difference between NUMERIC and FLOAT in BigQuery?

I like the current answers. I want to add this as a proof of why NUMERIC is necessary:

SELECT 
4.35 * 100 a_float
, CAST(4.35 AS NUMERIC) * 100 a_numeric

Sample Image

This is not a bug - this is exactly how the IEEE defines floats should be handled. Meanwhile NUMERIC exhibits behavior closer to what humans expect.

For another proof of NUMERIC usefulness, this answer shows how NUMERIC can handle numbers too big for JavaScript to normally handle.

Before you blame BigQuery for this problem, you can check that most other programming languages will do the same. Python, for example:

Sample Image

Difference between numeric, float and decimal in SQL Server

use the float or real data types only if the precision provided by decimal (up to 38 digits) is insufficient

  • Approximate numeric data types(see table 3.3) do not store the exact values specified for many numbers; they store an extremely close approximation of the value. (Technet)

  • Avoid using float or real columns in WHERE clause search conditions, especially the = and <> operators. It is best to limit float and real columns to > or < comparisons. (Technet)

so generally choosing Decimal as your data type is the best bet if

  • your number can fit in it. Decimal precision is 10E38[~ 38 digits]
  • smaller storage space (and maybe calculation speed) of Float is not important for you
  • exact numeric behavior is required, such as in financial applications, in operations involving rounding, or in equality checks. (Technet)


  1. Exact Numeric Data Types decimal and numeric - MSDN
  • numeric = decimal (5 to 17 bytes)
    • will map to Decimal in .NET
    • both have (18, 0) as default (precision,scale) parameters in SQL server
    • scale = maximum number of decimal digits that can be stored to the right of the decimal point.
    • money(8 byte) and smallmoney(4 byte) are also Exact Data Type and will map to Decimal in .NET and have 4 decimal points (MSDN)

  1. Approximate Numeric Data Types float and real - MSDN
  • real (4 byte)
    • will map to Single in .NET
    • The ISO synonym for real is float(24)
  • float (8 byte)
    • will map to Double in .NET

Exact Numeric Data Types
Approximate Numeric Data Types

  • All exact numeric types always produce the same result, regardless of which kind of processor architecture is being used or the magnitude of the numbers
  • The parameter supplied to the float data type defines the number of bits that are used to store the mantissa of the floating point number.
  • Approximate Numeric Data Type usually uses less storage and have better speed (up to 20x) and you should also consider when they got converted in .NET
  • What is the difference between Decimal, Float and Double in C#
  • Decimal vs Double Speed
  • SQL Server - .NET Data Type Mappings (From MSDN)

main source : MCTS Self-Paced Training Kit (Exam 70-433): Microsoft® SQL Server® 2008 Database Development - Chapter 3 - Tables, Data Types, and Declarative Data Integrity Lesson 1 - Choosing Data Types (Guidelines) - Page 93

Comparison between numpy array and float number of the same data type?

The answer is pretty obvious if you do this:

import numpy as np

a = np.arange(0, 1, 0.1)

print('\n'.join(map(str, zip(a, a >= np.float64(0.6)))))

Result:

(0.0, False)
(0.1, False)
(0.2, False)
(0.30000000000000004, False)
(0.4, False)
(0.5, False)
(0.6000000000000001, True)
(0.7000000000000001, True)
(0.8, True)
(0.9, True)

It's just a classic case of this: Is floating point math broken?

You asked why this isn't a problem for float32. For example:

import numpy as np

a = np.arange(0, 1, 0.1, dtype=np.float32)

print('\n'.join(map(str, zip(a, a < np.float32(0.6)))))

Result:

(0.0, True)
(0.1, True)
(0.2, True)
(0.3, True)
(0.4, True)
(0.5, True)
(0.6, False)
(0.7, False)
(0.8, False)
(0.90000004, False)

The clue is in the length of that last value. Notice how 0.90000004 is a lot shorter than 0.30000000000000004 and 0.6000000000000001. This is because there is less precision available in 32 bits than there is in 64 bits.

In fact, this is the entire reason to use 64-bit floats over 32-bit ones, when you need the precision. Depending on your system's architecture, 64-bit is likely to be a bit slower and certain to take up twice the space, but the precision is better. How exactly depends on the implementation of the floating point number (there are many choices that are too technical and detailed to go into here) - but there's twice the number of bits available to store information about the number, so you can see how that allows an increase in precision.

It just so happens that in 32 bits, the format has a representation of 0.6 that has enough zeroes for it to just say 0.6 (instead of 0.60000000). In 64-bits, the best values to represent 0.6 have even more zeroes, but a non-zero gets in at the end, showing the inaccuracy of the representation in that format.

It seems counterintuitive that float32 is 'more precise' than float64 in this case, but that's just a matter of cherry-picking. If you looked at a large random selection of numbers, you'd find that float64 gets a lot closer on average. It just so happens that float32 appears more accurate by accident.

The key takeaway here is that floating point numbers are an approximation of the real numbers. They are sufficiently accurate for most everyday operations and the errors tend to average out over time for many use cases, if the format is well-designed. However, because there is some error involved in most cases (of course some numbers just happen to get accurately represented, every point in the floating point type still falls on the real number line), when printing floating point numbers, some rounding is generally required as a result.

My favourite example to show that imprecision shows up early in Python (or any language with floats really):

>>> .1 + .1 + .1 == .3
False
>>> print(.1 + .1 + .1, f'{.1 + .1 + .1:.1f}')
0.30000000000000004 0.3

And if you need better precision, you can look at types like decimal. Also, in very specific cases more bits than 64 may be available, but that's likely to lead to surprises around support and I would not recommend it.



Related Topics



Leave a reply



Submit