Why do we need std::numeric_limits::max_digits10?
You seem to be confusing two sources of rounding (and precision loss) with floating point numbers.
Floating point representation
The first one is due to the way floating point numbers are represented in memory, which uses binary numbers for the mantissa and exponent, as you just pointed. The classic example being :
const float a = 0.1f;
const float b = 0.2f;
const float c = a+b;
printf("%.8f + %.8f = %.8f\n",a,b,c);
which will print
0.10000000 + 0.20000000 = 0.30000001
There, the mathematically correct result is 0.3, but 0.3 is not representable with the binary representation. Instead you get the closest number which can be represented.
Saving to text
The other one, which is where max_digits10
comes into play, is for text representation of floating point number, for example, when you do printf
or write to a file.
When you do this using the %f
format specifier you get the number printed out in decimal.
When you print the number in decimal you may decide how many digits get printed out. In some cases you might not get an exact printout of the actual number.
For example, consider
const float x = 10.0000095f;
const float y = 10.0000105f;
printf("x = %f ; y = %f\n", x,y);
this will print
x = 10.000010 ; y = 10.000010
on the other hand, increasing the precision of printf
to 8 digits with %.8f
will give you.
x = 10.00000954 ; y = 10.00001049
So if you wanted to save these two float values as text to a file using fprintf
or ofstream
with the default number of digits, you may have saved the same value twice where you originally had two different values for x
and y
.
max_digits10
is the answer to the question "how many decimal digits do I need to write in order to avoid this situation for all possible values ?". In other words, if you write your float with max_digits10
digits (which happens to be 9 for floats) and load it back, you're guaranteed to get the same value you started with.
Note that the decimal value written may be different from the floating point number's actual value (due to the different representation. But it is still guaranteed than when you read the text of the decimal number into a float
you will get the same value.
Edit: an example
See the code runt there : https://ideone.com/pRTMZM
Say you have your two float
s from earlier,
const float x = 10.0000095f;
const float y = 10.0000105f;
and you want to save them to text (a typical use-case would be saving to a human-readable format like XML or JSON, or even using prints to debug). In my example I'll just write to a string using stringstream
.
Let's try first with the default precision :
stringstream def_prec;
def_prec << x <<" "<<y;
// What was written ?
cout <<def_prec.str()<<endl;
The default behaviour in this case was to round each of our numbers to 10
when writing the text. So now if we use that string to read back to two other floats, they will not contain the original values :
float x2, y2;
def_prec>>x2 >>y2;
// Check
printf("%.8f vs %.8f\n", x, x2);
printf("%.8f vs %.8f\n", y, y2);
and this will print
10 10
10.00000954 vs 10.00000000
10.00001049 vs 10.00000000
This round trip from float to text and back has erased a lot of digits, which might be significant. Obviously we need to save our values to text with more precision than this. The documentation guarantees that using max_digits10
will not lose data in the round trip. Let's give it a try using setprecision
:
const int digits_max = numeric_limits<float>::max_digits10;
stringstream max_prec;
max_prec << setprecision(digits_max) << x <<" "<<y;
cout <<max_prec.str()<<endl;
This will now print
10.0000095 10.0000105
So our values were saved with more digits this time. Let's try reading back :
float x2, y2;
max_prec>>x2 >>y2;
printf("%.8f vs %.8f\n", x, x2);
printf("%.8f vs %.8f\n", y, y2);
Which prints
10.00000954 vs 10.00000954
10.00001049 vs 10.00001049
Aha ! We got our values back !
Finally, let's see what happens if we use one digit less than max_digits10
.
stringstream some_prec;
some_prec << setprecision(digits_max-1) << x <<" "<<y;
cout <<some_prec.str()<<endl;
Here this is what we get saved as text
10.00001 10.00001
And we read back :
10.00000954 vs 10.00000954
10.00001049 vs 10.00000954
So here, the precision was enough to keep the value of x
but not the value of y
which was rounded down. This means we need to use max_digits10
if we want to make sure different floats can make the round trip to text and stay different.
Why is std::numeric_limits::digits10 is one less for int types?
std::numeric_limits<T>::digits10
is the guaranteed number of digits in a sense that a number with that many digits can be represented in type T
without causing overflow or loss of information.
E.g. std::numeric_limits<int64_t>::digits10
cannot be 19 becuase 9'223'372'036'854'775'808
has 19 digits but is not representable in int64_t
.
In general case such guaranteed value of digits<N>
will always suffer from this "one less" discrepancy on platforms where digits<N>
is not a power of radix
used for internal representation. In non-exotic cases radix
is 2. Since 10 is not a power of 2, digits10
is smaller by 1 than the length of the max value.
If std::numeric_limits<T>
included digits16
or digits8
these values would've been "precise" for radix 2 platforms.
std::to_chars() minimal floating point buffer size
Note that the minimal buffer required is different depending on the floating point format desired. Using max_digits10
and max_exponent10
is always enough to determine the minimum number of characters necessary for base-10 output, assuming one doesn't want to output more precision than the floating point type contains.
This problem is not just limited to to_chars
, either. The C standard library functions in the printf
family will have the same behavior, so this applies with equal weight in C as it does in C++.
std::chars_format::scientific
or%e (printf specifier)
:template<typename T>
constexpr int log10ceil(T num) {
return num < 10? 1: 1 + log10ceil(num / 10);
}
std::array<char, 4 +
std::numeric_limits<FloatType>::max_digits10 +
std::max(2, log10ceil(std::numeric_limits<FloatType>::max_exponent10))
> buf;The function
log10ceil
allows constexpr evaluation of how many digits are in the largest exponent possible. At least 2 digits must be present in the exponent per the standard, hence the test against a minimum exponent width. The precision used when writing must be no larger thanmax_digits10 - 1
. Using this exact precision will provide lossless conversion to a string representation.The addition of 4 characters accommodates the possible sign, the decimal point, and the
"e+"
or"e-"
in the output.std::chars_format::fixed
or%f (printf specifier)
:std::array<char, 2 +
std::numeric_limits<FloatType>::max_exponent10 +
std::numeric_limits<FloatType>::max_digits10
> buf;Again, the precision used must be no larger than
max_digits10 - 1
. Using this exact precision will provide lossless conversion to a string representation.The addition of 2 characters accommodates the possible sign and the decimal point in the output.
std::chars_format::general
or%g (printf specifier)
:For the
general
case, the minimal buffer is always the same as thescientific
case. However, the precision used must be no larger thanmax_digits10
for lossless conversion to a string representation, rather than subtracting one as mentioned above.
Note that in all these examples, the buffer is exactly the size of the largest string representation. If a NUL-terminator or other content is needed, the size must be increased accordingly.
How do I print a double value with full precision using cout?
You can set the precision directly on std::cout
and use the std::fixed
format specifier.
double d = 3.14159265358979;
cout.precision(17);
cout << "Pi: " << fixed << d << endl;
You can #include <limits>
to get the maximum precision of a float or double.
#include <limits>
typedef std::numeric_limits< double > dbl;
double d = 3.14159265358979;
cout.precision(dbl::max_digits10);
cout << "Pi: " << d << endl;
What's the reason why text-float-text guarantee 6 digit but float-text-float does 9?
Decimal→Binary→Decimal
Consider the seven-digit decimal floating-point values 9,999,979•103 (9,999,979,000) and 9,999,978•103 (9,999,978,000). When you convert these to binary floating-point with 24-bit significands, you get 1001 0101 0000 0010 1110 0100•210 (9,999,978,496) in both cases, because that is the closest binary floating-point value to each of the numbers. (The next lower and higher binary floating-point numbers are 1001 0101 0000 0010 1110 0011•210 (9,999,977,472) and 1001 0101 0000 0010 1110 0101•210 (9,999,979,520).)
Therefore, 24-bit significands cannot distinguish all decimal floating-point numbers with seven-digit significands. We can do at most six digits.
Binary→Decimal→Binary
Consider the two 24-bit-significant binary floating-point values 1111 1111 1111 1111 1111 1101•23 (134,217,704) and 1111 1111 1111 1111 1111 1100•23 (134,217,696). If you convert these to decimal floating-point with eight-digit significands, you get 13,421,770•101 in both cases. Then you cannot tell them apart. So you need at least nine decimal digits.
You can think of this as some “chunking” that is forced by where the digit positions lie. At the top of a decimal number, we need a bit big enough to exceed 5 in the first digit. But, the nearest power of two is not necessarily going to start with 5 in that position—it might start with 6, or 7, or 8, or 9, so there is some wastage in it. At the bottom, we need a bit lower than 1 in the last digit. But the nearest power of two is does not necessarily start with 9 in the next lower position. It might start with 8 or 7 or 6 or even 5. So again, there is some wastage. To go from binary to decimal to binary, you need enough decimal digits to fit around the wastage, so you need extra decimal digits. To go from decimal to binary to decimal, you have to keep the decimal digits few enough so that they plus the wastage fit inside the binary, so you need fewer decimal digits.
Related Topics
"Volatile" Qualifier and Compiler Reorderings
Unit Test That a Class Is Non Copyable, and Other Compile-Time Properties
How to Implement Readlink to Find the Path
Win32 C/C++ Load Image from Memory Buffer
How to Control My Pc's Fan Speed Using C++ in Vista
Erase Element in Vector While Iterating the Same Vector
How to Generate Type Library from Unmanaged Com Dll
Stl Container with Std::Unique_Ptr's VS Boost::Ptr_Container
Boost::Asio with Boost::Unique_Future
#Error Please Use the /Md Switch for _Afxdll Builds
Where Does the -Dndebug Normally Come From
How to Initialize an Array of Struct in C++
Memoized, Recursive Factorial Function
What Does C4250 Vc++ Warning Mean
Explicit Type Conversion and Multiple Simple Type Specifiers
Stl Algorithms: Why No Additional Interface for Containers (Additional to Iterator Pairs)