What Is the Internal Precision of Numpy.Float128

What is the internal precision of numpy.float128?

It's quite recommended to use longdouble instead of float128, since it's quite a mess, ATM. Python will cast it to float64 during initialization.

Inside numpy, it can be a double or a long double. It's defined in npy_common.h and depends of your platform. I don't know if you can include it out-of-the-box into your source code.

If you don't need performance in this part of your algorithm, a safer way could be to export it to a string and use strold afterwards.

Why does numpy's float128 only have 63 bits mantissa?

Reading the docs:

np.longdouble is padded to the system default; np.float96 and
np.float128 are provided for users who want specific padding. In spite
of the names, np.float96 and np.float128 provide only as much
precision as np.longdouble, that is, 80 bits on most x86 machines and
64 bits in standard Windows builds.

So it appears it isn't going to actually use all those bits. I suppose, it doesn't account for the missing two bits, 15 + 63 = 78 if we assume 80 bits on x86 architecture (what I have as well).

How many digits can float8, float16, float32, float64, and float128 contain?

This is not as simple as usually expected. For accuracy of mantissa, there generally are two values:

  1. Given a value in decimal representation, how many decimal digits can be guaranteedly preserved if converted from decimal to a selected binary format and back (with default rounding).

  2. Given a value in binary format, how many decimal digits are needed if value is converted to decimal format and back to original binary format (again, with default rounding) to get the original value unchanged.

In both cases, decimal representation is treated as independent of used exponent, without leading and trailing zeros (for example, all of 0.0123e4, 1.23e2, 1.2300e2, 123, 123.0, 123000.000e-3 are 3 digits).

For 32-bit binary float, these two sizes are 6 and 9 decimal digits, respectively. In C <float.h>, these are FLT_DIG and FLT_DECIMAL_DIG. (This is weird that 32-bit float keeps 7 decimal digits for total most of all numbers, but there are exceptions.)
In C++, look at std::numeric_limits<float>::digits10 and std::numeric_limits<float>::max_digits10, respectively.

For 64-bit binary float, these are 15 and 17 (DBL_DIG and DBL_DECIMAL_DIG, respectively; and std::numeric_limits<double>::{digits10, max_digits10}).

General formulas for them (thx2 @MarkDickinson)

  • ${format}_DIG (digits10): floor((p-1)*log10(2))
  • ${format}_DECIMAL_DIG (max_digits10): ceil(1+p*log10(2))

where p is number of digits in mantissa (including hidden one for normalized IEEE754 case).

Also, comments with some mathematical explanation at C++ numeric limits page:

The standard 32-bit IEEE 754 floating-point type has a 24 bit fractional part (23 bits written, one implied), which may suggest that it can represent 7 digit decimals (24 * std::log10(2) is 7.22), but relative rounding errors are non-uniform and some floating-point values with 7 decimal digits do not survive conversion to 32-bit float and back: the smallest positive example is 8.589973e9, which becomes 8.589974e9 after the roundtrip. These rounding errors cannot exceed one bit in the representation, and digits10 is calculated as (24-1)*std::log10(2), which is 6.92. Rounding down results in the value 6.

Look for values for 16- and 128-bit floats in comments (but see below for what is 128-bit float in real).

For exponent, this is simpler because each of the border values (minimum normalized, minimum denormalized, maximum represented) are exact and can be easily obtained and printed.

@PaulPanzer suggested numpy.finfo. It gives first of these values ({format}_DIG); maybe it is the thing you search:

>>> numpy.finfo(numpy.float16).precision
3
>>> numpy.finfo(numpy.float32).precision
6
>>> numpy.finfo(numpy.float64).precision
15
>>> numpy.finfo(numpy.float128).precision
18

but, on most systems (my one was Ubuntu 18.04 on x86-84) the value is confusing for float128; it is really for 80-bit x86 "extended" float with 64 bits significand; real IEEE754 float128 has 112 significand bits and so real value shall be around 33, but numpy presents another type under this name. See here for details: in general, float128 is a delusion in numpy.

UPD3: you mentioned float8 - there is no such type in IEEE754 set. One could imagine such type for some utterly specific purposes, but its range will bee too narrow for any universal usage.

How to define np.float128 variable in python?

Have you tried the following?

from numpy import float128

n=float128(10000)

print(type(n))

Print:

<class 'numpy.float128'>

Bear in mind that there are some issues using numpy.float128 on Windows 64-bit (see numpy.float128 doesn't exist in windows, but is called from OpenGL). I have tested this code in an online Python editor.

Cannot use 128bit float in Python on 64bit architecture

Update: From the comments, it seems pointless to even have a 128 bit float on a 64 bit system.

I am using anaconda on a 64-bit Ubuntu 14.04 system with
sys.version_info(major=2, minor=7, micro=9, releaselevel='final', serial=0)

and 128 bit floats work fine:

import numpy
a = numpy.float128(3)

This might be an distribution problem. Try:

  • Install Anaconda
  • Update canopy
  • Check that the version of python in the path is the one supplied by anaconda or canopy

EDIT:
Update from the comments:

Not my downvote, but this post doesn't really answer the "why doesn't
np.float128 exist on my machine" implied question. The true answer is
that this is platform specific: float128 exists on some platforms but
not others, and on those platforms where it does exist it's almost
certainly simply the 80-bit x87 extended precision type, padded to 128
bits. – Mark Dickinson

NumPy and decimal128

Numpy does not support such a data type yet (at least on mainstream architectures). Only float16, float32, float64 and the non standard native extended double (generally with 80 bits) are supported. Put it shortly, only floating-point types natively supported by the target architecture. If the target machine support 128 bit double-precision numbers, then you could try the numpy.longdouble type but I do not expect this to be the case. In practice, x86 processors does not support that yet as well as ARM. IBM processors like POWER9 supports that natively but I am not sure they (fully) support the IEEE-754R standard. For more information please read this. Note that you could theoretically wrap binary data in Numpy types but you will not be able to do anything (really) useful with it. The Numpy code can theoretically be extended with new types but please note that Numpy is written in C and not C++ so adding the std::decimal::decimal128 in the source code will not be easy.

Note that if you really want to wrap such a type in Numpy array without having to change/rebuild the Numpy code, could wrap your type in a pure-Python class. However, be aware that the performance will be very bad since using pure-Python object prevent all the optimization done in Numpy (eg. SIMD vectorization, use of fast native code, specific algorithm optimized for a given type, etc.).

how set numpy floating point accuracy?

Do you care about the actual precision of the result, or about getting the exact same digits back from your two calculations?

If you just want the same digits, you could use np.around() to round the results to some appropriate number of decimal places. However, by doing this you'll only reduce the precision of the result.

If you actually want to compute the result more precisely, you could try using the np.longdouble type for your input array, which, depending on your architecture and compiler, might give you an 80- or 128-bit floating point representation, rather than the standard 64-bit np.double*.

You can compare the approximate number of decimal places of precision using np.finfo:

print np.finfo(np.double).precision
# 15

print np.finfo(np.longdouble).precision
# 18

Note that not all numpy functions will support long double - some will down-cast it to double.


*However, some compilers (such as Microsoft Visual C++) will always treat long double as synonymous with double, in which case there would be no difference in precision between np.longdouble and np.double.



Related Topics



Leave a reply



Submit