How to Alter a Float by Its Smallest Increment (Or Close to It)

How to alter a float by its smallest increment (or close to it)?

Check your math.h file. If you're lucky you have the nextafter and nextafterf functions defined. They do exactly what you want in a portable and platform independent way and are part of the C99 standard.

Another way to do it (could be a fallback solution) is to decompose your float into the mantissa and exponent part. Incrementing is easy: Just add one to the mantissa. If you get an overflow you have to handle this by incrementing your exponent. Decrementing works the same way.

EDIT: As pointed out in the comments it is sufficient to just increment the float in it's binary representation. The mantissa-overflow will increment the exponent, and that's exactly what we want.

That's in a nutshell the same thing that nextafter does.

This won't be completely portable though. You would have to deal with endianess and the fact that not all machines do have IEEE floats (ok - the last reason is more academic).

Also handling NAN's and infinites can be a bit tricky. You cannot simply increment them as they are by definition not numbers.

How to alter a float by its smallest increment in Java?

Use Double.doubleToRawLongBits and Double.longBitsToDouble:

double d = // your existing value;
long bits = Double.doubleToLongBits(d);
bits++;
d = Double.longBitsToDouble(bits);

The way IEEE-754 works, that will give you exactly the next viable double, i.e. the smallest amount greater than the existing value.

(Eventually it'll hit NaN and probably stay there, but it should work for sensible values.)

How to alter double by its smallest increment

The general idea is first convert the double to its long representation (using doubleToLongBits as you have done in getRealBinary), increment that long by 1, and finally convert the new long back to the double it represents via longBitsToDouble.

EDIT: Java (since 1.5) provides Math.ulp(double), which I'm guessing you can use to compute the next higher value directly thus: x + Math.ulp(x).

Increment a Python floating point value by the smallest possible amount

Python 3.9 and above

Starting with Python 3.9, released 2020-10-05, you can use the math.nextafter function:

math.nextafter(x, y)

Return the next floating-point value after x towards y.

If x is equal to y, return y.

Examples:

  • math.nextafter(x, math.inf) goes up: towards positive infinity.

  • math.nextafter(x, -math.inf) goes down: towards minus infinity.

  • math.nextafter(x, 0.0) goes towards zero.

  • math.nextafter(x, math.copysign(math.inf, x)) goes away from zero.

See also math.ulp().

Python increment float by smallest step possible predetermined by its number of decimals

As the other commenters have noted: You should not operate with floats because a given number 0.1234 is converted into an internal representation and you cannot further process it the way you want. This is deliberately vaguely formulated. Floating points is a subject for itself. This article explains the topic very well and is a good primer on the topic.

That said, what you could do instead is to have the input as strings (e.g. do not convert it to float when reading from input). Then you could do this:

from decimal import Decimal

def add_one(v):
after_comma = Decimal(v).as_tuple()[-1]*-1
add = Decimal(1) / Decimal(10**after_comma)
return Decimal(v) + add

if __name__ == '__main__':
print(add_one("0.00531"))
print(add_one("0.051959"))
print(add_one("0.0067123"))
print(add_one("1"))

This prints

0.00532
0.051960
0.0067124
2

Update:

If you need to operate on floats, you could try to use a fuzzy logic to come to a close presentation. decimal offers a normalize function which lets you downgrade the precision of the decimal representation so that it matches the original number:

from decimal import Decimal, Context

def add_one_float(v):
v_normalized = Decimal(v).normalize(Context(prec=16))
after_comma = v_normalized.as_tuple()[-1]*-1
add = Decimal(1) / Decimal(10**after_comma)
return Decimal(v_normalized) + add

But please note that the precision of 16 is purely experimental, you need to play with it to see if it yields the desired results. If you need correct results, you cannot take this path.

How do you find a float's nearest non-equal value?

The standard way to find a floating-point value's neighbors is the function nextafter for double and nextafterf for float. The second argument gives the direction. Remember that infinities are legal values in IEEE 754 floating-point, so you can very well call nextafter(x, +1.0/0.0) to get the value immediately above x, and this will work even for DBL_MAX (whereas if you wrote nextafter(x, DBL_MAX), it would return DBL_MAX when applied for x == DBL_MAX).

Two non-standard ways that are sometimes useful are:

  1. access the representation of the float/double as an unsigned integer of the same size, and increment or decrement this integer. The floating-point format was carefully designed so that for positive floats, and respectively for negative floats, the bits of the representation, seen as an integer, evolve monotonously with the represented float.

  2. change the rounding mode to upward, and add the smallest positive floating-point number. The smallest positive floating-point number is also the smallest increment that there can be between two floats, so this will never skip any float. The smallest positive floating-point number is FLT_MIN * FLT_EPSILON.


For the sake of completeness, I will add that even without changing the rounding mode from its “to nearest” default, multiplying a float by (1.0f + FLT_EPSILON) produces a number that is either the immediate neighbor away from zero, or the neighbor after that. It is probably the cheapest if you already know the sign of the float you wish to increase/decrease and you don't mind that it sometimes does not produce the immediate neighbor. Functions nextafter and nextafterf are specified in such a way that a correct implementation on the x86 must test for a number of special values and FPU states, and is thus rather costly for what it does.

To go towards zero, multiply by 1.0f - FLT_EPSILON.

This doesn't work for 0.0f, obviously, and generally for the smaller denormalized numbers.

The values for which multiplying by 1.0f + FLT_EPSILON advance by 2 ULPS are just below a power of two, specifically in the interval [0.75 * 2p … 2p). If you don't mind doing a multiplication and an addition, x + (x * (FLT_EPSILON * 0.74)) should work for all normal numbers (but still not for zero nor for all the small denormal numbers).

Is using increment (operator++) on floats bad style?

In general ++/-- is not defined for floats, since it's not clear with which value the float should be incremented. So, you may have luck on one system where ++ leads to f += 1.0f but there may be situations where this is not valid. Therefore, for floats, you'll have to provide a specific value.

++/-- is defined as "increment/decrement by 1". Therefore this is applicable to floating point values. However, personally i think, that this can be confusing to someone who isn't aware of this definition (or only applies it to integers), so i would recommend using f += 1.0f.

Converting floating point = to and = to

By design, for IEEE754 data types, you can simply treat the value as an integer and increment the value. Or decrement it if the value is negative.

function NextDoubleGreater(const D: Double): Double;
var
SpecialType: TFloatSpecial;
I: Int64;
begin
SpecialType := D.SpecialType;
case SpecialType of
fsZero,fsNZero:
// special handling needed around 0 and -0
I := 1;
fsInf, fsNInf, fsNaN:
I := PInt64(@D)^; // return the original value
fsDenormal, fsNDenormal, fsPositive, fsNegative:
begin
I := PInt64(@D)^;
if I >= 0 then begin
inc(I);
end else begin
dec(I);
end;
end;
end;
Result := PDouble(@I)^;
end;

And similarly in the opposite direction:

function NextDoubleLess(const D: Double): Double;
var
SpecialType: TFloatSpecial;
I: Int64;
begin
SpecialType := D.SpecialType;
case SpecialType of
fsZero,fsNZero:
// special handling needed around 0 and -0
I := $8000000000000001;
fsInf, fsNInf, fsNaN:
I := PInt64(@D)^; // return the original value
fsDenormal, fsNDenormal, fsPositive, fsNegative:
begin
I := PInt64(@D)^;
if I >= 0 then begin
dec(I);
end else begin
inc(I);
end;
end;
end;
Result := PDouble(@I)^;
end;

It's no coincidence that the format is this way. Implementation of floating point comparison operators is trivial because of this design.

Reference: How to alter a float by its smallest increment (or close to it)?



Related Topics



Leave a reply



Submit