Double Precision Floating Values in Python

Double precision floating values in Python?

Decimal datatype

  • Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem.

If you are pressed by performance issuses, have a look at GMPY

double precision and single precision floating point numbers?

This answer assumes single is IEEE 754 32 bit binary floating point, and double is the corresponding 64 bit type.

Any value that can be represented exactly in a single can also be represented exactly as a double. That is the case for 3.0. The closest single and the closest double both have value exactly 3, and are equal.

If a number cannot be represented exactly in a single, the double is likely to be a closer approximation and different from the single. That is the case for 1.0/3.0. The closest single is 0.3333333432674407958984375. The closest double is 0.333333333333333314829616256247390992939472198486328125.

Both single and double are binary floating point. A number cannot be expressed exactly unless it is equal a fraction of the form A/(2**B), where A is an integer, B is a natural number, and "**" represents exponent. Numbers such as 0.1 and 0.2 that are terminating decimal fractions but not terminating binary fractions behave like 1/3.0. For example, the closest single to 0.1 is 0.100000001490116119384765625, the closest double is 0.1000000000000000055511151231257827021181583404541015625

Python float precision float

As a first step, you should use a NumPy array to store your data instead of a Python list.

As you correctly observe, a Python float uses double precision internally, and the double-precision value underlying a Python float can be represented in 8 bytes. But on a 64-bit machine, with the CPython reference implementation of Python, a Python float object takes a full 24 bytes of memory: 8 bytes for the underlying double-precision value, 8 bytes for a pointer to the object type, and 8 bytes for a reference count (used for garbage collection). There's no equivalent of Java's "primitive" types or .NET's "value" types in Python - everything is boxed. That makes the language semantics simpler, but means that objects tend to be fatter.

Now if we're creating a Python list of float objects, there's the added overhead of the list itself: one 8-byte object pointer per Python float (still assuming a 64-bit machine here). So in general, a list of n Python float objects is going to cost you over 32n bytes of memory. On a 32-bit machine, things are a little better, but not much: our float objects are going to take 16 bytes each, and with the list pointers we'll be using 20n bytes of memory for a list of floats of length n. (Caveat: this analysis doesn't quite work in the case that your list refers to the same Python float object from multiple list indices, but that's not a particularly common case.)

In contrast, a NumPy array of n double-precision floats (using NumPy's float64 dtype) stores its data in "packed" format in a single data block of 8n bytes, so allowing for the array metadata the total memory requirement will be a little over 8n bytes.

Conclusion: just by switching from a Python list to a NumPy array you'll reduce your memory needs by about a factor of 4. If that's still not enough, then it might make sense to consider reducing precision from double to single precision (NumPy's float32 dtype), if that's consistent with your accuracy needs. NumPy's float16 datatype takes only 2 bytes per float, but records only about 3 decimal digits of precision; I suspect that it's going to be close to useless for the application you describe.

Extreme floating point precision number 1.0-(0.1^200) in python

The standard fractions.Fraction class provides unlimited-precision exact rational arithmetic.

>>> from fractions import Fraction
>>> x = Fraction(1) - Fraction(1, 10**200)

decimal.Decimal might also suit your needs if you're looking for high-precision rather than exact arithmetic, but you'll have to tweak the context settings to allow more than the default 28 significant digits.

>>> from decimal import Decimal, Context
>>> decimal.setcontext(Context(prec=1000))
>>> x = Decimal(1) - 1 / Decimal(10**200)

Differentiate between single and double precision

I propose to just try it using float and if that fails (due to range overflow) use the double version:

try:
binary = struct.pack('>f', value)
except OverflowError:
binary = struct.pack('>d', value)

The range is the only aspect in which your question makes perfect sense.

If it comes to precision, your question loses to make sense because, as you say, Python always uses doubles internally, and even a simple 3.3 is, packed and unpacked as float, only 3.299999952316284 afterwards:

struct.unpack('>f', struct.pack('>f', 3.3))
(3.299999952316284,)

So virtually no double can be represented as a float. (Typically none that isn't an int or otherwise coming out of a float originally.)

You could, however, make a check whether the packed-unpacked version of your number equals the original, and if it does, use the float version:

try:
binary = struct.pack('>f', value)
if struct.unpack('>f', binary)[0] != value:
binary = struct.pack('>d', value)
except OverflowError:
binary = struct.pack('>d', value)

on what systems does Python not use IEEE-754 double precision floats

In theory, as you say, CPython is designed to be buildable and usable on any platform without caring about what floating-point format their C double is using.

In practice, two things are true:

  • To the best of my knowledge, CPython has not met a system that's not using IEEE 754 binary64 format for its C double within the last 15 years (though I'd love to hear stories to the contrary; I've been asking about this at conferences and the like for a while). My knowledge is a long way from perfect, but I've been involved with mathematical and floating-point-related aspects of CPython core development for at least 13 of those 15 years, and paying close attention to floating-point related issues in that time. I haven't seen any indications on the bug tracker or elsewhere that anyone has been trying to run CPython on systems using a floating-point format other than IEEE 754 binary64.

  • I strongly suspect that the first time modern CPython does meet such a system, there will be a significant number of test failures, and so the core developers are likely to find out about it fairly quickly. While we've made an effort to make things format-agnostic, it's currently close to impossible to do any testing of CPython on other formats, and it's highly likely that there are some places that implicitly assume IEEE 754 format or semantics, and that will break for something more exotic. We have yet to see any reports of such breakage.

There's one exception to the "no bug reports" report above. It's this issue: https://bugs.python.org/issue27444. There, Greg Stark reported that there were indeed failures using VAX floating-point. It's not clear to me whether the original bug report came from a system that emulated VAX floating-point.

I joined the CPython core development team in 2008. Back then, while I was working on floating-point-related issues I tried to keep in mind 5 different floating-point formats: IEEE 754 binary64, IBM's hex floating-point format as used in their zSeries mainframes, the Cray floating-point format used in the SV1 and earlier machines, and the VAX D-float and G-float formats; anything else was too ancient to be worth worrying about. Since then, the VAX formats are no longer worth caring about. Cray machines now use IEEE 754 floating-point. The IBM hex floating-point format is very much still in existence, but in practice the relevant IBM hardware also has support for IEEE 754, and the IBM machines that Python meets all seem to be using IEEE 754 floating-point.

Rather than exotic floating-point formats, the modern challenges seem to be more to do with variations in adherence to the rest of the IEEE 754 standard: systems that don't support NaNs, or treat subnormals differently, or allow use of higher precision for intermediate operations, or where compilers make behaviour-changing optimizations.

The above is all about CPython-the-implementation, not Python-the-language. But the story for the Python language is largely similar. In theory, it makes no assumptions about the floating-point format. In practice, I don't know of any alternative Python implementations that don't end up using an IEEE 754 binary format (if not semantics) for the float type. IronPython and Jython both target runtimes that are explicit that floating-point will be IEEE 754 binary64. JavaScript-based versions of Python will similarly presumably be using JavaScript's Number type, which is required to be IEEE 754 binary64 by the ECMAScript standard. PyPy runs on more-or-less the same platforms that CPython does, with the same floating-point formats. MicroPython uses single-precision for its float type, but as far as I know that's still IEEE 754 binary32 in practice.



Related Topics



Leave a reply



Submit