Convert Float to Double Without Losing Precision

Convert float to double without losing precision

It's not that you're actually getting extra precision - it's that the float didn't accurately represent the number you were aiming for originally. The double is representing the original float accurately; toString is showing the "extra" data which was already present.

For example (and these numbers aren't right, I'm just making things up) suppose you had:

float f = 0.1F;
double d = f;

Then the value of f might be exactly 0.100000234523. d will have exactly the same value, but when you convert it to a string it will "trust" that it's accurate to a higher precision, so won't round off as early, and you'll see the "extra digits" which were already there, but hidden from you.

When you convert to a string and back, you're ending up with a double value which is closer to the string value than the original float was - but that's only good if you really believe that the string value is what you really wanted.

Are you sure that float/double are the appropriate types to use here instead of BigDecimal? If you're trying to use numbers which have precise decimal values (e.g. money), then BigDecimal is a more appropriate type IMO.

Safely convert `float` to `double` without loss of precision

Based on my reading this is a bug in the implementation?

No. It's a bug in your expectations. The double value you're seeing is exactly the same value as the float value. The precise value is 13.8999996185302734375.

That's not the same as "the closest double value to 13.9" which is 13.9000000000000003552713678800500929355621337890625.

You're assigning the value 13.8999996185302734375 to a double value, and then printing the string representation - which is 13.899999618530273 as that's enough precision to completely distinguish it from other double values. If it were to print 13.9, that would be a bug, as there's a double value that's closer to 13.9, namely 13.9000000000000003552713678800500929355621337890625.

Conversion without lossing precision from float to double and back into float

"0,00749052940542251" is not a float - exactly.

A 32-bit float can encode about 232 different values exactly. 0,00749052940542251 is not one of them. Instead when code assigns 0,00749052940542251 to a float, the float stores a nearby value:

0,007490529 213...  The closest float
0,007490529 40542251
0,007490529 678... The next closest float

As double's precision and range exceeds float, any float --> double --> float can round-trip exactly.


0,00749052940542251 is not one of the about 264 different values encodable as a double either. Again, a nearby value is used. Closer, yet not the same.

0,007490529 4054225 099..
0,007490529 4054225 1
0,007490529 4054225 107..

Converting float to double loses precision C#

The issue observed in this question is caused largely by Microsoft’s choice of formatting, notably that Microsoft software fails to show the exact values because it limits the number of digits used to convert to decimal even when the format string requests more digits. Furthermore, it uses fewer digits when converting float than when converting double. Thus, if a float and double with the same value are formatted, the results may be different because the float formatting will use fewer significant digits.

Below, I go through the code statements in the question one by one. In summary, the crux of the matter is that the value 61.0099983215332 is formatted as “61.0100000000000” when it is a float and “61.0099983215332” when it is a double. This is purely Microsoft’s choice of formatting and is not caused by the nature of floating-point arithmetic.

The statement double temp3 = 61.01 initializes temp3 to exactly 61.00999999999999801048033987171947956085205078125. This change from 61.01 is necessary due to the nature of a binary floating-point format—it cannot represent exactly 61.01, so the nearest value representable in double is used.

The statement dynamic temp = 61.01f initializes temp to exactly 61.009998321533203125. As with double, the nearest representable value has been used, but, since float has less precision, the nearest value is not as close as in the double case.

The statement double temp2 = (double)Convert.ChangeType(temp, typeof(double)); converts temp to a double that has the same value as temp, so it has the value 61.009998321533203125.

The statement double newValue = temp2 - temp3; correctly subtracts the two values, producing the exact result 0.00000167846679488548033987171947956085205078125, with no error.

The statement Console.WriteLine(String.Format(" {0:F20}", temp)); formats the float named temp. Formatting a float involves callling Single.ToString. Microsoft‘s documentation is a bit vague. It says that, by default, only seven (decimal) digits of precision are returned. It says to use G or R formats to get up to nine, and F20 uses neither G nor R. So I believe only seven digits are used. When 61.009998321533203125 is rounded to seven significant decimal digits, the result is “61.01000”. The ToString method then pads this to twenty digits after the decimal point, producing “61.01000000000000000000”.

I will address your third WriteLine statement next and come back to the second one afterward.

The statement Console.WriteLine(String.Format(" {0:F20}", temp3)); formats the double named temp3. Since temp3 is a double, Double.ToString is called. This method uses 15 digits of precision (unless G orR are used). When 61.00999999999999801048033987171947956085205078125 is rounded to 15 significant decimal digits, the result is “61.0100000000000”. The ToString method then pads this to twenty digits after the decimal point, producing “61.01000000000000000000”.

The statement Console.WriteLine(String.Format(" {0:F20}", temp2)); formats the double named temp2. temp2 is a double that contains the value from the float temp, so it contains 61.009998321533203125. When this is converted to 15 significant decimal digits, the result is “61.0099983215332”. The ToString method then pads this to twenty digits after the decimal point, producing “61.00999832153320000000”.

Finally, the statement Console.WriteLine(String.Format(" {0:F20}", newValue)); formats newValue. Formatting .00000167846679488548033987171947956085205078125 to 15 significant digits produces “0.00000167846679488548”.

Can float be round tripped via double without losing precision?

Yes. IEEE754 floating point (which is what C# must use) guarantees this:

  1. Converting a float to a double preserves exactly the same value

  2. Converting that double back to a float recovers exactly that original float.

The set of doubles is a superset of floats.

Note that this also applies to NaN, +Infinity, and -Infinity. The signedness of zero is also preserved.

Convert float to double loses precision but not via ToString

Its not a loss of precision .3 is not representable in floating point. When the system converts to the string it rounds; if you print out enough significant digits you will get something that makes more sense.

To see it more clearly

float f = 0.3f;
double d1 = System.Convert.ToDouble(f);
double d2 = System.Convert.ToDouble(f.ToString("G20"));

string s = string.Format("d1 : {0} ; d2 : {1} ", d1, d2);

output

"d1 : 0.300000011920929 ; d2 : 0.300000012 "

cannot convert from float to double?

The cause of the compilation error is the following assignment:

float x = A_2 * B;

where B is of type, double and therefore the result of the product will of type, double which can not be accommodated into a variable of type, float. Remember: double requires 8 bytes of space whereas a float variable can accommodate only 4 bytes.

After correcting this compilation error, you will encounter a runtime error because you have used a plus sign (+) instead of a comma (,) inside the printf statement.

Apart from this,

  1. Always follow Java naming conventions e.g. A should be a and A_2 should be a2 following the conventions.
  2. You can use Math.PI instead of using your own value for PI.

The following code incorporates these changes:

import java.util.Scanner;

public class Main {
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
// let a be radius
float a = scanner.nextFloat();
float a2 = a * a;
// let b be PI
double b = Math.PI;

// let x be circumference
double x = a2 * b;

System.out.printf("x= %.4f", x);
}
}

A sample run:

2
x= 12.5664


Related Topics



Leave a reply



Submit