What Is the Maximum Length in Chars Needed to Represent Any Double Value

What is the maximum length in chars needed to represent any double value?

The standard header <float.h> in C, or <cfloat> in C++, contains several constants to do with the range and other metrics of the floating point types. One of these is DBL_MAX_10_EXP, the largest power-of-10 exponent needed to represent all double values. Since 1eN needs N+1 digits to represent, and there might be a negative sign as well, then the answer is

int max_digits = DBL_MAX_10_EXP + 2;

This assumes that the exponent is larger than the number of digits needed to represent the largest possible mantissa value; otherwise, there will also be a decimal point followed by more digits.

CORRECTION

The longest number is actually the smallest representable negative number: it needs enough digits to cover both the exponent and the mantissa. This value is -pow(2, DBL_MIN_EXP - DBL_MANT_DIG), where DBL_MIN_EXP is negative. It's fairly easy to see (and prove by induction) that -pow(2,-N) needs 3+N characters for a non-scientific decimal representation ("-0.", followed by N digits). So the answer is

int max_digits = 3 + DBL_MANT_DIG - DBL_MIN_EXP

For a 64-bit IEEE double, we have

DBL_MANT_DIG = 53
DBL_MIN_EXP = -1023
max_digits = 3 + 53 - (-1023) = 1079

What is the maximum length of Double.toString(d)?

It's a 64-bit IEEE-754 float.

The most decimal numbers that can be stored in a 52-bit mantissa is 17 (see page 4: ceil( 1 + N Log10(2) )), so that's 19 characters with the decimal point and negative sign.

The bias is 1023, so the smallest base-2 exponent is 2^-1022, which is around 10^-308, so the longest exponent is 5 characters with the 'E' and negative sign.

19 + 5 == 24

Maximum Width of a Printed Double in C++

Who knows. The Standard doesn't say how many digits of precision a double provides other than saying it (3.9.1.8) "provides at least as much precision as float," so you don't really know how many characters you'll need to sprintf an arbitrary value. Even if you did know how many digits your implementation provided, there's still the question of exponential formatting, etc.

But there's a MUCH bigger question here. Why the heck would you care? I'm guessing it's because you're trying to write something like this:

double d = ...;
int MAGIC_NUMBER = ...;
char buffer[MAGIC_NUMBER];
sprintf(buffer, "%f", d);

This is a bad way to do this, precisely because you don't know how big MAGIC_NUMBER should be. You can pick something that should be big enough, like 14 or 128k, but then the number you picked is arbitrary, not based on anything but a guess that it will be big enough. Numbers like MAGIC_NUMBER are, not suprisingly, called Magic Numbers. Stay away from them. They will make you cry one day.

Instead, there's a lot of ways to do this string formatting without having to care about buffer sizes, digits of precision, etc, that let you just get on with the buisness of programming. Streams is one:

#include <sstream>

double d = ...;
stringstream ss;
ss << d;
string s = ss.str();
cout << s;

...Boost.Format is another:

#include <boost\format\format.hpp>

double d = ... ;
string s = (boost::format("%1%") % d).str();
cout << s;

Limiting the type length of the user on C

Change this:

char chain[10];

to this:

char chain[11]; // +1 for the NULL terminator

since C-strings should be NULL terminated, thus we need one cell reserved for the NULL-terminator in our array (which will store our string).


I mean, if the user would put a 11 character for example, the console just don't grab it

Not possible in C.

or in the case this is impossible for C, make that if the user put more than 12 characters on the input, then the program launches an error message saying it exceeds the limit.

Yes, let's do that! Read the string, and if the length of it is more than 10 characters, then print an error message.

Allow chain array to be of size 12 (10 for the maximum length of the valid input, 1 for an extra character (if any) and 1 for the NULL-terminator), so that we can store the extra character, if any.

Example:

#include <stdio.h>
#include <string.h>
int main(void)
{
char chain[12];
printf("introduce real numbers:\n");
fgets(chain, sizeof(chain), stdin);
chain[strcspn(chain, "\n")] = '\0';

if(strlen(chain) > 10)
{
printf("Error: Maximum length of chain is 10! Exiting..\n");
return 1;
}

return 0;
}

Note: You could use EXIT_SUCCESS and EXIT_FAILURE, instead of plain numbers (1 and 0 respectively): Should I return 0 or 1 for successful function?


Irrelevant to OP's question: In the full version of your code though, there is a plethora of problems, such as this top line of code int valid=validate_numbers(char number[]);, which wishes to declare the method. It should be just validate_numbers(char number[]);. The same holds true for the definition of the method too. Make sure you go through all your code again, and read the messages the compiler gifts to you. :)

What are the limit values of int32 till which int32 to float conversion can work without rounding to nearest value?

The maximum integer value of a float significand is FLT_RADIX/FLT_EPSILON - 1. By “integer value” of a significand, I mean the value when it is scaled so that its lowest bit represents a value of 1.

The value FLT_RADIX/FLT_EPSILON is also representable in float, since it is a power of the radix. FLT_RADIX/FLT_EPSILON + 1 is not representable in float, so converting an integer to float might result in rounding if the integer exceeds FLT_RADIX/FLT_EPSILON in magnitude.

If it is known that INT_MAX exceeds FLT_RADIX/FLT_EPSILON, you can test this for a non-negative int x with (int) (FLT_RADIX/FLT_EPSILON) < x. If it is not known that FLT_RADIX/FLT_EPSILON can be converted to int successfully, more complicated tests may be needed.

Very commonly, C implementations use the IEEE-754 binary32 format, also known as “single precision,” for float. In this format, FLT_RADIX/FLT_EPSILON is 224 = 16,777,216.

These symbols are defined in <float.h>. For double or long double, replace FLT_EPSILON with DBL_EPSILON or LDBL_EPSILON. FLT_RADIX remains unchanged since it is the same for all formats.

Theoretically, a perverse floating-point format might have an abnormally small exponent range that makes FLT_RADIX/FLT_EPSILON - 1 not representable because the significand cannot be scaled high enough. This can be disregarded in practice.



Related Topics



Leave a reply



Submit