Why do float and int have such different maximum values even though they're the same number of bits?
Your intuition quite rightly tells you that there can be no more information content in one than the other, because they both have 32 bits. But that doesn't mean we can't use those bits to represent different values.
Suppose I invent two new datatypes, uint4
and foo4
. uint4
uses 4 bits to represent an integer, in the standard binary representation, so we have
bits value
0000 0
0001 1
0010 2
...
1111 15
But foo4
uses 4 bits to represent these values:
bits value
0000 0
0001 42
0010 -97
0011 1
...
1110 pi
1111 e
Now foo4
has a much wider range of values than uint4
, despite having the same number of bits! How? Because there are some uint4
values that can't be represented by foo4
, so those 'slots' in the bit mapping are available for other values.
It is the same for int
and float
- they can both store values from a set of 232 values, just different sets of 232 values.
Float and Int Both 4 Bytes? How Come?
Well, here's a quick explanation:
An int and float usually take up "one-word" in memory. Today, with the shift to 64bit systems this may mean that your word is 64 bits, or 8 bytes, allowing the representation of a huge span of numbers. Or, it could still be a 32bit system meaning each word in memory takes up 4 bytes. Typically memory can be accessed on a word by word basis.
The difference between int
and float
is not their physical space in memory, but in the way the ALU (Arithmetic Logic Unit) behaves with the number. An int represents its directly corresponding number in binary (well, almost--it uses two's complement notation). A float
on the other hand is encoded (typically in IEEE 754 standard format) to represent a number in exponential form (i.e. 2.99*10^6 is in exponential form).
Your misunderstanding I think lies in the misconception that a floating point can represent more information. While float
s can represent numbers of greater magnitude, it cannot represent them with as much accuracy, because it has to account for encoding the exponent. The exponent itself could be quite a large number. So the number of significant digits you get out of a floating point number is less (which means less information is represented) and whereas int
s represent a range of integers, the magnitude of numbers they represent is much smaller.
What is the difference between NUMERIC and FLOAT in BigQuery?
I like the current answers. I want to add this as a proof of why NUMERIC
is necessary:
SELECT
4.35 * 100 a_float
, CAST(4.35 AS NUMERIC) * 100 a_numeric
This is not a bug - this is exactly how the IEEE defines floats should be handled. Meanwhile NUMERIC
exhibits behavior closer to what humans expect.
For another proof of NUMERIC
usefulness, this answer shows how NUMERIC
can handle numbers too big for JavaScript to normally handle.
Before you blame BigQuery for this problem, you can check that most other programming languages will do the same. Python, for example:
Difference between numeric, float and decimal in SQL Server
use the float or real data types only if the precision provided by decimal (up to 38 digits) is insufficient
Approximate numeric data types(see table 3.3) do not store the exact values specified for many numbers; they store an extremely close approximation of the value. (Technet)
Avoid using float or real columns in WHERE clause search conditions, especially the = and <> operators. It is best to limit float and real columns to > or < comparisons. (Technet)
so generally choosing Decimal as your data type is the best bet if
- your number can fit in it. Decimal precision is 10E38[~ 38 digits]
- smaller storage space (and maybe calculation speed) of Float is not important for you
- exact numeric behavior is required, such as in financial applications, in operations involving rounding, or in equality checks. (Technet)
- Exact Numeric Data Types decimal and numeric - MSDN
- numeric = decimal (5 to 17 bytes)
- will map to Decimal in .NET
- both have (18, 0) as default (precision,scale) parameters in SQL server
- scale = maximum number of decimal digits that can be stored to the right of the decimal point.
- money(8 byte) and smallmoney(4 byte) are also Exact Data Type and will map to Decimal in .NET and have 4 decimal points (MSDN)
- Approximate Numeric Data Types float and real - MSDN
- real (4 byte)
- will map to Single in .NET
- The ISO synonym for real is float(24)
- float (8 byte)
- will map to Double in .NET
- All exact numeric types always produce the same result, regardless of which kind of processor architecture is being used or the magnitude of the numbers
- The parameter supplied to the float data type defines the number of bits that are used to store the mantissa of the floating point number.
- Approximate Numeric Data Type usually uses less storage and have better speed (up to 20x) and you should also consider when they got converted in .NET
- What is the difference between Decimal, Float and Double in C#
- Decimal vs Double Speed
- SQL Server - .NET Data Type Mappings (From MSDN)
main source : MCTS Self-Paced Training Kit (Exam 70-433): Microsoft® SQL Server® 2008 Database Development - Chapter 3 - Tables, Data Types, and Declarative Data Integrity Lesson 1 - Choosing Data Types (Guidelines) - Page 93
Comparison between numpy array and float number of the same data type?
The answer is pretty obvious if you do this:
import numpy as np
a = np.arange(0, 1, 0.1)
print('\n'.join(map(str, zip(a, a >= np.float64(0.6)))))
Result:
(0.0, False)
(0.1, False)
(0.2, False)
(0.30000000000000004, False)
(0.4, False)
(0.5, False)
(0.6000000000000001, True)
(0.7000000000000001, True)
(0.8, True)
(0.9, True)
It's just a classic case of this: Is floating point math broken?
You asked why this isn't a problem for float32
. For example:
import numpy as np
a = np.arange(0, 1, 0.1, dtype=np.float32)
print('\n'.join(map(str, zip(a, a < np.float32(0.6)))))
Result:
(0.0, True)
(0.1, True)
(0.2, True)
(0.3, True)
(0.4, True)
(0.5, True)
(0.6, False)
(0.7, False)
(0.8, False)
(0.90000004, False)
The clue is in the length of that last value. Notice how 0.90000004
is a lot shorter than 0.30000000000000004
and 0.6000000000000001
. This is because there is less precision available in 32 bits than there is in 64 bits.
In fact, this is the entire reason to use 64-bit floats over 32-bit ones, when you need the precision. Depending on your system's architecture, 64-bit is likely to be a bit slower and certain to take up twice the space, but the precision is better. How exactly depends on the implementation of the floating point number (there are many choices that are too technical and detailed to go into here) - but there's twice the number of bits available to store information about the number, so you can see how that allows an increase in precision.
It just so happens that in 32 bits, the format has a representation of 0.6 that has enough zeroes for it to just say 0.6 (instead of 0.60000000). In 64-bits, the best values to represent 0.6 have even more zeroes, but a non-zero gets in at the end, showing the inaccuracy of the representation in that format.
It seems counterintuitive that float32
is 'more precise' than float64
in this case, but that's just a matter of cherry-picking. If you looked at a large random selection of numbers, you'd find that float64
gets a lot closer on average. It just so happens that float32
appears more accurate by accident.
The key takeaway here is that floating point numbers are an approximation of the real numbers. They are sufficiently accurate for most everyday operations and the errors tend to average out over time for many use cases, if the format is well-designed. However, because there is some error involved in most cases (of course some numbers just happen to get accurately represented, every point in the floating point type still falls on the real number line), when printing floating point numbers, some rounding is generally required as a result.
My favourite example to show that imprecision shows up early in Python (or any language with floats really):
>>> .1 + .1 + .1 == .3
False
>>> print(.1 + .1 + .1, f'{.1 + .1 + .1:.1f}')
0.30000000000000004 0.3
And if you need better precision, you can look at types like decimal
. Also, in very specific cases more bits than 64 may be available, but that's likely to lead to surprises around support and I would not recommend it.
Related Topics
How to Find All the Methods That Call a Given Method in Java
Retrieving a Random Item from Arraylist
Why Does Eclipse Complain About @Override on Interface Methods
Spark SQL How to Explode Without Losing Null Values
Execute .Jar File from a Java Program
How to Generate Cdata Block Using Jaxb
Reference Is Ambiguous with Generics
Httpurlconnection Timeout Settings
Should I Close the Servlet Outputstream
How to Get a Reference Address