What Is the Purpose of Max_Digits10 and How Is It Different from Digits10

Why do we need std::numeric_limits::max_digits10?

You seem to be confusing two sources of rounding (and precision loss) with floating point numbers.

Floating point representation

The first one is due to the way floating point numbers are represented in memory, which uses binary numbers for the mantissa and exponent, as you just pointed. The classic example being :

const float a = 0.1f;
const float b = 0.2f;
const float c = a+b;

printf("%.8f + %.8f = %.8f\n",a,b,c);

which will print

0.10000000 + 0.20000000 = 0.30000001

There, the mathematically correct result is 0.3, but 0.3 is not representable with the binary representation. Instead you get the closest number which can be represented.

Saving to text

The other one, which is where max_digits10 comes into play, is for text representation of floating point number, for example, when you do printf or write to a file.

When you do this using the %f format specifier you get the number printed out in decimal.

When you print the number in decimal you may decide how many digits get printed out. In some cases you might not get an exact printout of the actual number.

For example, consider

const float x = 10.0000095f;
const float y = 10.0000105f;
printf("x = %f ; y = %f\n", x,y);

this will print

x = 10.000010 ; y = 10.000010

on the other hand, increasing the precision of printf to 8 digits with %.8f will give you.

 x = 10.00000954 ; y = 10.00001049

So if you wanted to save these two float values as text to a file using fprintf or ofstream with the default number of digits, you may have saved the same value twice where you originally had two different values for x and y.

max_digits10 is the answer to the question "how many decimal digits do I need to write in order to avoid this situation for all possible values ?". In other words, if you write your float with max_digits10 digits (which happens to be 9 for floats) and load it back, you're guaranteed to get the same value you started with.

Note that the decimal value written may be different from the floating point number's actual value (due to the different representation. But it is still guaranteed than when you read the text of the decimal number into a float you will get the same value.

Edit: an example

See the code runt there : https://ideone.com/pRTMZM

Say you have your two floats from earlier,

const float x = 10.0000095f;
const float y = 10.0000105f;

and you want to save them to text (a typical use-case would be saving to a human-readable format like XML or JSON, or even using prints to debug). In my example I'll just write to a string using stringstream.

Let's try first with the default precision :

stringstream def_prec;
def_prec << x <<" "<<y;

// What was written ?
cout <<def_prec.str()<<endl;

The default behaviour in this case was to round each of our numbers to 10 when writing the text. So now if we use that string to read back to two other floats, they will not contain the original values :

float x2, y2;
def_prec>>x2 >>y2;

// Check
printf("%.8f vs %.8f\n", x, x2);
printf("%.8f vs %.8f\n", y, y2);

and this will print

10 10
10.00000954 vs 10.00000000
10.00001049 vs 10.00000000

This round trip from float to text and back has erased a lot of digits, which might be significant. Obviously we need to save our values to text with more precision than this. The documentation guarantees that using max_digits10 will not lose data in the round trip. Let's give it a try using setprecision:

const int digits_max = numeric_limits<float>::max_digits10;
stringstream max_prec;
max_prec << setprecision(digits_max) << x <<" "<<y;
cout <<max_prec.str()<<endl;

This will now print

10.0000095 10.0000105

So our values were saved with more digits this time. Let's try reading back :

float x2, y2;
max_prec>>x2 >>y2;

printf("%.8f vs %.8f\n", x, x2);
printf("%.8f vs %.8f\n", y, y2);

Which prints

10.00000954 vs 10.00000954
10.00001049 vs 10.00001049

Aha ! We got our values back !

Finally, let's see what happens if we use one digit less than max_digits10.

stringstream some_prec;
some_prec << setprecision(digits_max-1) << x <<" "<<y;
cout <<some_prec.str()<<endl;

Here this is what we get saved as text

10.00001 10.00001

And we read back :

10.00000954 vs 10.00000954
10.00001049 vs 10.00000954

So here, the precision was enough to keep the value of x but not the value of y which was rounded down. This means we need to use max_digits10 if we want to make sure different floats can make the round trip to text and stay different.

Why is std::numeric_limits::digits10 is one less for int types?

std::numeric_limits<T>::digits10 is the guaranteed number of digits in a sense that a number with that many digits can be represented in type T without causing overflow or loss of information.

E.g. std::numeric_limits<int64_t>::digits10 cannot be 19 becuase 9'223'372'036'854'775'808 has 19 digits but is not representable in int64_t.

In general case such guaranteed value of digits<N> will always suffer from this "one less" discrepancy on platforms where digits<N> is not a power of radix used for internal representation. In non-exotic cases radix is 2. Since 10 is not a power of 2, digits10 is smaller by 1 than the length of the max value.

If std::numeric_limits<T> included digits16 or digits8 these values would've been "precise" for radix 2 platforms.

std::to_chars() minimal floating point buffer size

Note that the minimal buffer required is different depending on the floating point format desired. Using max_digits10 and max_exponent10 is always enough to determine the minimum number of characters necessary for base-10 output, assuming one doesn't want to output more precision than the floating point type contains.

This problem is not just limited to to_chars, either. The C standard library functions in the printf family will have the same behavior, so this applies with equal weight in C as it does in C++.

  • std::chars_format::scientific or %e (printf specifier):

    template<typename T>
    constexpr int log10ceil(T num) {
    return num < 10? 1: 1 + log10ceil(num / 10);
    }

    std::array<char, 4 +
    std::numeric_limits<FloatType>::max_digits10 +
    std::max(2, log10ceil(std::numeric_limits<FloatType>::max_exponent10))
    > buf;

    The function log10ceil allows constexpr evaluation of how many digits are in the largest exponent possible. At least 2 digits must be present in the exponent per the standard, hence the test against a minimum exponent width. The precision used when writing must be no larger than max_digits10 - 1. Using this exact precision will provide lossless conversion to a string representation.

    The addition of 4 characters accommodates the possible sign, the decimal point, and the "e+" or "e-" in the output.

  • std::chars_format::fixed or %f (printf specifier):

    std::array<char, 2 + 
    std::numeric_limits<FloatType>::max_exponent10 +
    std::numeric_limits<FloatType>::max_digits10
    > buf;

    Again, the precision used must be no larger than max_digits10 - 1. Using this exact precision will provide lossless conversion to a string representation.

    The addition of 2 characters accommodates the possible sign and the decimal point in the output.

  • std::chars_format::general or %g (printf specifier):

    For the general case, the minimal buffer is always the same as the scientific case. However, the precision used must be no larger than max_digits10 for lossless conversion to a string representation, rather than subtracting one as mentioned above.

Note that in all these examples, the buffer is exactly the size of the largest string representation. If a NUL-terminator or other content is needed, the size must be increased accordingly.

How do I print a double value with full precision using cout?

You can set the precision directly on std::cout and use the std::fixed format specifier.

double d = 3.14159265358979;
cout.precision(17);
cout << "Pi: " << fixed << d << endl;

You can #include <limits> to get the maximum precision of a float or double.

#include <limits>

typedef std::numeric_limits< double > dbl;

double d = 3.14159265358979;
cout.precision(dbl::max_digits10);
cout << "Pi: " << d << endl;

What's the reason why text-float-text guarantee 6 digit but float-text-float does 9?

Decimal→Binary→Decimal

Consider the seven-digit decimal floating-point values 9,999,979•103 (9,999,979,000) and 9,999,978•103 (9,999,978,000). When you convert these to binary floating-point with 24-bit significands, you get 1001 0101 0000 0010 1110 0100•210 (9,999,978,496) in both cases, because that is the closest binary floating-point value to each of the numbers. (The next lower and higher binary floating-point numbers are 1001 0101 0000 0010 1110 0011•210 (9,999,977,472) and 1001 0101 0000 0010 1110 0101•210 (9,999,979,520).)

Therefore, 24-bit significands cannot distinguish all decimal floating-point numbers with seven-digit significands. We can do at most six digits.

Binary→Decimal→Binary

Consider the two 24-bit-significant binary floating-point values 1111 1111 1111 1111 1111 1101•23 (134,217,704) and 1111 1111 1111 1111 1111 1100•23 (134,217,696). If you convert these to decimal floating-point with eight-digit significands, you get 13,421,770•101 in both cases. Then you cannot tell them apart. So you need at least nine decimal digits.

You can think of this as some “chunking” that is forced by where the digit positions lie. At the top of a decimal number, we need a bit big enough to exceed 5 in the first digit. But, the nearest power of two is not necessarily going to start with 5 in that position—it might start with 6, or 7, or 8, or 9, so there is some wastage in it. At the bottom, we need a bit lower than 1 in the last digit. But the nearest power of two is does not necessarily start with 9 in the next lower position. It might start with 8 or 7 or 6 or even 5. So again, there is some wastage. To go from binary to decimal to binary, you need enough decimal digits to fit around the wastage, so you need extra decimal digits. To go from decimal to binary to decimal, you have to keep the decimal digits few enough so that they plus the wastage fit inside the binary, so you need fewer decimal digits.



Related Topics



Leave a reply



Submit