Change Floating Point Rounding Mode

Change floating point rounding mode

This is the standard C solution:

#include <fenv.h>
#pragma STDC FENV_ACCESS ON

// store the original rounding mode
const int originalRounding = fegetround( );
// establish the desired rounding mode
fesetround(FE_TOWARDZERO);
// do whatever you need to do ...

// ... and restore the original mode afterwards
fesetround(originalRounding);

On backwards platforms lacking C99 support, you may need to resort to assembly. In this case, you may want to set the rounding for both the x87 unit (via the fldcw instruction) and SSE (via the ldmxcsr instruction).

Edit
You don't need to resort to assembly for MSVC. You can use the (totally non-standard) _controlfp( ) instead:

unsigned int originalRounding = _controlfp(0, 0);
_controlfp(_RC_CHOP, _MCW_RC);
// do something ...
_controlfp(originalRounding, _MCW_RC);

You can read more about _controlfp( ) on MSDN.

And, just for completeness, a decoder ring for the macro names for rounding modes:

rounding mode    C name         MSVC name
-----------------------------------------
to nearest       FE_TONEAREST   _RC_NEAR
toward zero      FE_TOWARDZERO  _RC_CHOP
to +infinity     FE_UPWARD      _RC_UP
to -infinity     FE_DOWNWARD    _RC_DOWN

How to change the rounding mode for floating point operations in MATLAB?

Answer

Kind of. There is an undocumented feature('setround') function call that you can use to get or set the rounding mode used by Matlab.

So, it can be done, but you shouldn’t do it. :)

WARNING: This is an undocumented, unsupported feature! Use at your own peril!

This feature('setround') supports 4 of the 5 IEEE-754 rounding modes: there’s only one “nearest” mode, and I don't know if it’s “ties to even” or “ties away from zero”.

Supported modes:

feature('setround') – Get current rounding mode
feature('setround', 0.5) – Round toward nearest (don’t know if it’s ties to even or away from zero)
feature('setround', Inf) – Round up (towards +Inf)
feature('setround', 0) – Round toward zero
feature('setround', -Inf) – Round down (towards -Inf)

Note on testing: The IEEE-754 rounding mode does not affect round() and its relatives. Rather, it governs how arithmetic operations behave around the limits of floating-point precision.

Demonstration

%ROUNDINGEXAMPLE Demonstrates IEEE-754 Rounding Mode control
%
% This uses a completely undocumented and unsupported feature!
% Not for production use!

%% Setup
clear; clc

n = 2000;
X = ones(n)*1E-30; % matrix with n^2 elements
defaultRoundingMode = feature('setround'); % store default rounding mode

%%
feature('setround',0.5);
r1 = prettyPrint('Nearest', sum(X(:)));
%{
  sign   exponent                       mantissa
     0 01110110001 0011010101111100001010011001101001110101010000011110
     | \_________/ \__________________________________________________/
     |      |             ______________________|___________________________
     |      |            /                                                  \
(-1)^0 2^( 945 - 1023) 1.0011010101111100001010011001101001110101010000011110 = 4e-24
%}

%%
feature('setround',-Inf);
r2 = prettyPrint('To -Infinity', sum(X(:)));
%{
  sign   exponent                       mantissa
     0 01110110001 0011010101111100001010011001101001011100000111000110
     | \_________/ \__________________________________________________/
     |      |             ______________________|___________________________
     |      |            /                                                  \
(-1)^0 2^( 945 - 1023) 1.0011010101111100001010011001101001011100000111000110 = 4e-24
%}

%%
feature('setround',Inf);
r3 = prettyPrint('To Infinity', sum(X(:)));
%{
  sign   exponent                       mantissa
     0 01110110001 0011010101111100001010011001101010100011101100100001
     | \_________/ \__________________________________________________/
     |      |             ______________________|___________________________
     |      |            /                                                  \
(-1)^0 2^( 945 - 1023) 1.0011010101111100001010011001101010100011101100100001 = 4e-24
%}

%%
feature('setround',0);
r4 = prettyPrint('To zero', sum(X(:)));
%{
  sign   exponent                       mantissa
     0 01110110001 0011010101111100001010011001101001011100000111000110
     | \_________/ \__________________________________________________/
     |      |             ______________________|___________________________
     |      |            /                                                  \
(-1)^0 2^( 945 - 1023) 1.0011010101111100001010011001101001011100000111000110 = 4e-24
%}

%%
feature('setround',defaultRoundingMode);
r5 = prettyPrint('No accumulated roundoff error', 4e-24);
%{
  sign   exponent                       mantissa
     0 01110110001 0011010101111100001010011001101010001000111010100111
     | \_________/ \__________________________________________________/
     |      |             ______________________|___________________________
     |      |            /                                                  \
(-1)^0 2^( 945 - 1023) 1.0011010101111100001010011001101010001000111010100111 = 4e-24
%}

%% Helper function
function r = prettyPrint(s, r)
    fprintf('%s:\n%65.60f\n\n', s, r); 
end

I get:

Nearest:
   0.000000000000000000000003999999999966490758963870373537264729

To -Infinity:
   0.000000000000000000000003999999999789077070014108839608005726

To Infinity:
   0.000000000000000000000004000000000118618095059505975310731249

To zero:
   0.000000000000000000000003999999999789077070014108839608005726

No accumulated roundoff error:
   0.000000000000000000000003999999999999999694801998206811298525

Acknowledgments

Thanks to Ryan Klots at MathWorks Technical Support for setting me straight on this and providing the nice demo code!

Default rounding mode in python, and how to specify it to another one?

With IEEE754-based platform (as most modern ones do, including x86, ARM, MIPS...), it's default mode "round to nearest, ties to even" is the only mode available in Python standard library. That is "provided" by standardized defaults and absense of library methods to change it. There are more languages that doesn't allow to change rounding mode - e.g. Java - so this isn't an isolated Python whim.

In real, there are too few reasons to change this. Direct rounding modes of IEEE754 are very special in their use. (I don't apologize the approach to stick on the default rounding, but simply comment on it.) For example, multiply of 1e308 by 1e308 with rounding to zero or to minus infinity results in approximately 1.8e308, so, the result is too far both from the exact answer and from POLA-based one (infinity). If you really need some specific modes for your computations, consider using specific libraries, like MPFR or gmpy2.

If you insist on changing this without external modules specialized on floating-point calculations, try using C-library fesetround via ctypes module or analog, e.g. here. Again, it's your choice to use such hacks and become responsible to all consequences. I'd suggest wrapping all pieces with special rounding to C-level code which restores the default mode on function exit.

Is there a way to set the Floating-Point Unit's rounding mode in Java?

If you are talking about a literal 0.01111116 in the source code of your program, the Java compiler converts that into the binary floating point representation at compile time.

If you are talking about (say) a String containing the characters "0.01111116", that gets converted to a binary floating point representation if/when you call (for example) Double.parseDouble(...).

Either way, the conversion happens behind the scenes and you don't have any control over the actual rounding. But in a sense it is moot. It is inherent in the nature of the representation that some rounding happens, and the result is generally speaking "the most accurate" you can get from a mathematical perspective ... given the floating point type you have chosen.

If you really wanted the conversion to use different rounding / truncation rules you could either do this after the fact (e.g. round or truncate the converted value), or you could implement your own String to floating-point conversion method.

You won't be able to change the way that the Java compiler converts literals. It is part of the language specification.

So I want to know the API to set the rounding mode to get the exact representation of machine floating-point number in Java.

There is another way of thinking about this.

The exact representation of a machine floating point number is 32 or 64 bits of binary data. You could render the bits of a double as hexadecimal in a couple of ways:

Double::doubleToLongBits or Double::doubleToRawLongBits followed by Long::toHexString gives a precise but unhelpful rendering, or
Double::toHexString gives a hexadecimal floating point representation.

All of these renderings are exact (no rounding errors) representations of the double, but most readers won't understand them. (The "raw" version deals best with edge-cases involving variant NaN values.)

There are equivalent methods for float.

How do I specify the rounding mode for floating point numbers?

It appears that the implementation of Float::round, at least for f32 and f64, forward to the roundf32/roundf64 instrinsics, which themselves are implemented using the LLVM functions llvm.round.f32 and llvm.round.f64. The documentation for llvm.round.* doesn't say anything about how to control the rounding mode, sadly. There doesn't appear to be anything else in the LLVM reference about it, either. The other functions I could find that even mentioned rounding modes either specified one particular rounding mode, or said it was undefined.

I couldn't find any solid information about this. There was a post on the LLVM mailing list from 2011 that talks about x86-specific intrinsics, and a 2013 post to the Native Client issue tracker that appears to talk about a hypothetical intrinsic and how it would be hard to do portably.

Taking a blind stab at it: I'd try writing a little C library that does it and just link to that. It doesn't appear to be directly supported in LLVM.

Changing float rounding mode

I suggest that you could Multiply the number, round it, and divide it.

int main()
{

    float a = 10.000;
    float b = 3.000;
    float c=a/b;
    float d = (int)(c * 1000+1) / 1000.0;
    cout << d;//d=3.334

    return 0;
}

Can the floating point rounding mode be set at compile time in Rust?

Not reliably, no. The problem is that the LLVM backend doesn't provide any support for modifying the rounding mode, though recently there have been some proposals to fix this, it's not likely to be resolved in the near future.

You might be able to call out to the C fesetround function (in fenv.h) at the start of your program, but the problem is that certain optimisations (such as constant folding) would have already been performed using the default rounding mode.

Change Floating Point Rounding Mode