Should We Generally Use Float Literals for Floats Instead of the Simpler Double Literals

Should we generally use float literals for floats instead of the simpler double literals?

Yes, you should use the f suffix. Reasons include:

  1. Performance. When you write float foo(float x) { return x*3.14; }, you force the compiler to emit code that converts x to double, then does the multiplication, then converts the result back to single. If you add the f suffix, then both conversions are eliminated. On many platforms, each those conversions are about as expensive as the multiplication itself.

  2. Performance (continued). There are platforms (most cellphones, for example), on which double-precision arithmetic is dramatically slower than single-precision. Even ignoring the conversion overhead (covered in 1.), every time you force a computation to be evaluated in double, you slow your program down. This is not just a "theoretical" issue.

  3. Reduce your exposure to bugs. Consider the example float x = 1.2; if (x == 1.2) // something; Is something executed? No, it is not, because x holds 1.2 rounded to a float, but is being compared to the double-precision value 1.2. The two are not equal.

Should I always use the appropriate literals for number types?

You should always explicitly indicate the type of literal that you intend to use. This will prevent problems when for example this sort of code:

float foo = 9.0f;
float bar = foo / 2;

changes to the following, truncating the result:

int foo = 9;
float bar = foo / 2;

It's a concern with function parameters as well when you have overloading and templates involved.

I know gcc has -Wconversion but I can't recall everything that it covers.

For integer values that fit in int I usually don't qualify those for long or unsigned as there is usually much less chance there for subtle bugs.

Make C floating point literals float (rather than double)

-fsingle-precision-constant flag can be used. It causes floating-point constants to be loaded in single precision even when this is not exact.

Note- This will also use single precision constants in operations on double precision variables.

Is literal double to float conversion equal to float literal?

Assuming IEEE 754, float as 32 bit binary, double as 64 bit binary.

There are decimal fractions that round differently, under IEEE 754 round-to-nearest rules, if converted directly from decimal to float from the result of first converting from decimal to double and then to float.

For example, consider 1.0000000596046447753906250000000000000000000000000001

1.000000059604644775390625 is exactly representable as a double and is exactly half way between 1.0 and 1.00000011920928955078125, the value of the smallest float greater than 1.0. 1.0000000596046447753906250000000000000000000000000001 rounds up to 1.00000011920928955078125 if converted directly, because it is greater than the mid point. If it is first converted to 64 bit, round to nearest takes it to the mid point 1.000000059604644775390625, and then round half even rounds down to 1.0.

Is there a reason to always declare floats with the type suffix 'f' in C#?

Here is the list of allowed Implicit Numeric Conversions. Implicit conversions are allowed when the target type can hold the original type range. In the case, float distance = 0.3; is an error because a float range cannot accommodate the double range.

As to efficiency, between 3 and 3f, the compiler should optimize for you.

IL_0001:  ldc.r4     3.   // float distance1 = 3;
IL_0006: stloc.0
IL_0007: ldc.r4 3. // float distance2 = 3f;
IL_000c: stloc.1

Is it a good practice to use 'd' when defining double literals in Java?

From the Java language Specification:

A floating-point literal is of type float if it ends with the letter F or f; otherwise its type is double and it can optionally end with the letter D or d.

The floating point types (float and double) can also be expressed
using E or e (for scientific notation), F or f (32-bit float literal)
and D or d (64-bit double literal; this is the default and by convention is omitted).

double d1 = 123.4;
// same value as d1, but in scientific notation
double d2 = 1.234e2;
float f1 = 123.4f;

So adding D or d is the same thing as omitting as you already know, but I would say that it is better to follow the convention and omit it.

Why Int and Float literals are allowed to be added, but Int and Float variables are not allowed to do the same in Swift?

In case of var sum = 4 + 5.0 the compiler automatically converts 4 to a float as that is what is required to perform the operation.
Same happens if you write var x: Float = 4. The 4 is automatically converted to a float.

In second case, since you have explicitly defined the type of the variable, the compiler does not have the freedom to change is as per the requirement.

For solution, look at @Fabio 's answer



Related Topics



Leave a reply



Submit