How to Force Division to Be Floating Point? Division Keeps Rounding Down to 0

How can I force division to be floating point? Division keeps rounding down to 0?

In Python 2, division of two ints produces an int. In Python 3, it produces a float. We can get the new behaviour by importing from __future__.

>>> from __future__ import division
>>> a = 4
>>> b = 6
>>> c = a / b
>>> c
0.66666666666666663

C Floating Point Division Result Rounded to Zero

The statement

float x = ((1000)/(24 * 60 * 60));

does the following:

  1. Declares a variable x of type float.
  2. Evaluates ((1000)/(24 * 60 * 60)).

    1. Evaluates 24*60*60 which is 86400.
    2. Evaluates 1000/86400 which is 0.
  3. Assigns the result of that (which is 0) to x.

In the second step, ((1000)/(24 * 60 * 60)) is zero - the division is integer division, because both operands are integers. The fact that the result gets assigned to a floating point variable later makes no difference.

The simplest fix is to make sure either side of the division is a floating-point number, so it will use floating-point division. For example, you could change 1000 to 1000.0f.

How does one make a floating point number stop rounding to an integer?

#include <stdio.h>
int main()
{
int s=4;
float f = ((float)s / 7);
printf("%f\n", f);
getch();
}

You just have to typecast the int to float. int/float division give you the floor of given integer whereas float/float division gives you float

How to show actual result for floating-point division in C?

You are obviously trying to store the result of the division into a float, but floats also include integers.
You're actually performing an integer division, because, in both cases, both of your terms are integers (don't have a fractional part).
The division is performed rounded (result being without a fractional part), and is assigned to your variable.

How to correct this?
Try adding a fractional part to at least one of the two numbers. For example:

float a = 3.0/2;

This should do it.

Controlling rounding with floating values

Like Eric mentioned, fesetround() is in principle the way to control the float rounding mode, but unfortunately it does not have great support.

Here is a thought on a way around this. Extract the upper 32 bits manually, then use POD arithmetic to compute the division. So essentially, emulate the floating point conversion and division in order to have full control over how rounding is done. To simplify this outline, I'll assume a and b are both positive at least 2^32, and hopefully it will be clear how to fill these details for a general implementation:

  1. Use a_bits = msb(a) to get the index of the most significant bit in a that is set to 1 (for more details, search for "msb" in this page on boost number).

  2. Get the upper 32 bits with uint32_t a_high = a >> (a_bits - 31).

    This step is effectively a rounding operation, rounding down. To round up instead, we can increment a_high by one if lsb(a) < a_bits - 31, that is, if the lowest set bit occurs below the extracted 32-bit portion. (An edge case is that this increment may cause the uint32_t to overflow, which I'll ignore.) I'll use "a_high_down" to refer to rounding down and "a_high_up" for rounding up.

    Either way, we have a float approximation of the dividend a:

    a ≈ 2^(a_bits − 31) ⋅ a_high.
  3. In the same way, make an approximate representation of the divisor b:

    b ≈ 2^(b_bits − 31) ⋅ b_high.

Then, the division a/b is approximately

a/b ≈ 2^(a_bits − b_bits − 32) ⋅ (2^32 ⋅ a_high) // b_high,

where // denotes integer division rounding down. The numerator 2^32 ⋅ a_high should be formed by casting a_high to uint64_t and shifting up.

A couple notes:

  • This adjustment is important so that the quotient keeps some bits of precision (since a_high and b_high have similar magnitude, doing simply a_high // b_high as an integer division would give only 1 bit of precision).
  • Instead of this 32-bit shifting, we could have extracted a_high as the upper 64 bits of a for a little more precision.

To bound the quotient from above and below, compute

a/b ≤ 2^(a_bits − b_bits − 32) ⋅ ((2^32 ⋅ a_high_up − 1) // b_high_down + 1),
a/b ≥ 2^(a_bits − b_bits − 32) ⋅ (2^32 ⋅ a_high_down) // b_high_up,

where // denotes integer division rounding down. The divisions on the rhs can be done in native uint64_t arithmetic. Finally, cast these division results to cpp_int and left-shifted by (a_bits − b_bits − 32).

Examples

Ex 1. For the values

a = 2^100 ⋅ 123456 = 156499072501776288991176990922899456,
b = 2^70 ⋅ 123455 = 145749938535668012464209920,

the exact quotient is about a/b = 1073750521.434887. The procedure above accurately approximates this, with the first 10 digits correct:

a_bits = floor(log2(a)) = 116
a_high = a >> (116 − 31) = 4045406208

b_bits = floor(log2(b)) = 86
b_high = b >> (86 − 31) = 4045373440

a/b ≈ 2^(a_bits − b_bits − 32) ⋅ (2^32 ⋅ a_high) // b_high
= 2^(−2) ⋅ (2^32 ⋅ 4045406208) // 4045373440
= 0.25 ⋅ 4295002085
= 1073750521.25.

Ex 2. For the values

a = 10^50 = 100000000000000000000000000000000000000000000000000,
b = 9^50 = 515377520732011331036461129765621272702107522001,

the exact quotient is about a/b = 194.03252175, and

a_bits = floor(log2(a)) = 166
a_high_down = a >> (166 − 31) = 2295887403
a_high_up = a_high_down + 1 = 2295887404

b_bits = floor(log2(b)) = 158
b_high_down = b >> (158 − 31) = 3029116820
b_high_up = b_high_down + 1 = 3029116821

a/b ≤ 2^(a_bits − b_bits − 32) ⋅ ((2^32 ⋅ a_high_up − 1) // b_high_down + 1)
= 2^(−24) ⋅ (9860761315478339583 // 3029116820 + 1)
= 2^(−24) ⋅ 3255325530
= 194.03252184

a/b ≥ 2^(a_bits − b_bits − 32) ⋅ (2^32 ⋅ a_high_down) // b_high_up
= 2^(−24) ⋅ (9860761311183372288 // 3029116821)
= 2^(−24) ⋅ 3255325526
= 194.03252161

Python small decimals automatically round to 0

Simple solution!

print 1 / float(100)

Your problem is that by default in Python 2 the division operator will do integer division (rounding down to integer). By making one of the operands a float, Python will divide in the expected way. You were almost there with float(1 / 100), however all this accomplishes is doing the integer division of 1 by 100, which equals zero, then converting zero to a floating point number.

This is a recognized issue in Python 2, fixed in Python 3. If you get tired of writing x / float(y) all the time, you can do from __future__ import division to make the division operator behave as in Python 3.

Division in C language

In this expression

15/4

the both operands have integer types (more precisely the type int). So the integer arithmetic is performed.

If at least one operand had a floating point type (float or double) as for example

15/4.0

or

15/4.0f

then the result will be a floating point number (in the first expression of the type double and in the second expression of the type float)

And in this expression

a/b

the both operands have floating point types (the type float). So the result is also a floating point number.

The behaviour of floating point division by zero

Division by zero both integer and floating point are undefined behavior [expr.mul]p4:

The binary / operator yields the quotient, and the binary % operator yields the remainder from the division
of the first expression by the second. If the second operand of / or % is zero the behavior is undefined. ...

Although implementation can optionally support Annex F which has well defined semantics for floating point division by zero.

We can see from this clang bug report clang sanitizer regards IEC 60559 floating-point division by zero as undefined that even though the macro __STDC_IEC_559__ is defined, it is being defined by the system headers and at least for clang does not support Annex F and so for clang remains undefined behavior:

Annex F of the C standard (IEC 60559 / IEEE 754 support) defines the
floating-point division by zero, but clang (3.3 and 3.4 Debian snapshot)
regards it as undefined. This is incorrect:


Support for Annex F is optional, and we do not support it.

#if STDC_IEC_559


This macro is being defined by your system headers, not by us; this is
a bug in your system headers. (FWIW, GCC does not fully support Annex
F either, IIRC, so it's not even a Clang-specific bug.)

That bug report and two other bug reports UBSan: Floating point division by zero is not undefined and clang should support Annex F of ISO C (IEC 60559 / IEEE 754) indicate that gcc is conforming to Annex F with respect to floating point divide by zero.

Though I agree that it isn't up to the C library to define STDC_IEC_559 unconditionally, the problem is specific to clang. GCC does not fully support Annex F, but at least its intent is to support it by default and the division is well-defined with it if the rounding mode isn't changed. Nowadays not supporting IEEE 754 (at least the basic features like the handling of division by zero) is regarded as bad behavior.

This is further support by the gcc Semantics of Floating Point Math in GCC wiki which indicates that -fno-signaling-nans is the default which agrees with the gcc optimizations options documentation which says:

The default is -fno-signaling-nans.

Interesting to note that UBSan for clang defaults to including float-divide-by-zero under -fsanitize=undefined while gcc does not:

Detect floating-point division by zero. Unlike other similar options, -fsanitize=float-divide-by-zero is not enabled by -fsanitize=undefined, since floating-point division by zero can be a legitimate way of obtaining infinities and NaNs.

See it live for clang and live for gcc.

Rounding integer division (instead of truncating)

int a = 59.0f / 4.0f + 0.5f;

This only works when assigning to an int as it discards anything after the '.'

Edit:
This solution will only work in the simplest of cases. A more robust solution would be:

unsigned int round_closest(unsigned int dividend, unsigned int divisor)
{
return (dividend + (divisor / 2)) / divisor;
}

Unexpected integer division vs. floating-point division result in Python

// computes the floor of the exact result of the division. / computes the nearest representable float to the exact result of the division.

For dividend 10000 (exactly 10000) and divisor 0.1 (slightly higher than the real number 0.1 due to floating-point limitations), the exact result is very slightly lower than 100000, but the difference is slight enough that the nearest representable float is 100000.0.

// produces 99999.0 because the exact result is less than 100000, but / produces 100000.0 rather than something like 99999.99999999999 because 100000.0 is the closest float to the exact result.



Related Topics



Leave a reply



Submit