Integer Overflow in Numpy Arrays

Integer overflow in numpy arrays

On your platform, np.arange returns an array of dtype 'int32' :

In [1]: np.arange(1000000).dtype
Out[1]: dtype('int32')

Each element of the array is a 32-bit integer. Squaring leads to a result which does not fit in 32-bits. The result is cropped to 32-bits and still interpreted as a 32-bit integer, however, which is why you see negative numbers.

Edit: In this case, you can avoid the integer overflow by constructing an array of dtype 'int64' before squaring:

a=np.arange(1000000,dtype='int64').reshape(1000,1000)

Note that the problem you've discovered is an inherent danger when working with numpy. You have to choose your dtypes with care and know before-hand that your code will not lead to arithmetic overflows. For the sake of speed, numpy can not and will not warn you when this occurs.

See http://mail.scipy.org/pipermail/numpy-discussion/2009-April/041691.html for a discussion of this on the numpy mailing list.

Avoid overflow when adding numpy arrays

You can achieve this by creating a third array of dtype uint8, plus a bool array ~~(which together are more memory efficient that one uint16 array)~~.

np.putmask is useful for avoiding a temp array.

a = np.array([100, 200, 250], dtype=np.uint8)
b = np.array([50, 50, 50], dtype=np.uint8)
c = 255 - b  # a temp uint8 array here
np.putmask(a, c < a, c)  # a temp bool array here
a += b

However, as @moarningsun correctly points out, a bool array takes the the same amount of memory as a uint8 array, so this isn't necessarily helpful. It is possible to solve this by avoiding having more than one temp array at any given time:

a = np.array([100, 200, 250], dtype=np.uint8)
b = np.array([50, 50, 50], dtype=np.uint8)
b = 255 - b  # old b is gone shortly after new array is created
np.putmask(a, b < a, b)  # a temp bool array here, then it's gone
a += 255 - b  # a temp array here, then it's gone

This approach trades memory consumption for CPU.

Another approach is to precalculate all possible results, which is O(1) extra memory (i.e. independent of the size of your arrays):

c = np.clip(np.arange(256) + np.arange(256)[..., np.newaxis], 0, 255).astype(np.uint8)
c
=> array([[  0,   1,   2, ..., 253, 254, 255],
          [  1,   2,   3, ..., 254, 255, 255],
          [  2,   3,   4, ..., 255, 255, 255],
          ..., 
          [253, 254, 255, ..., 255, 255, 255],
          [254, 255, 255, ..., 255, 255, 255],
          [255, 255, 255, ..., 255, 255, 255]], dtype=uint8)

c[a,b]
=> array([150, 250, 255], dtype=uint8)

This approach is the most memory-efficient if your arrays are very big. Again, it is expensive in processing time, because it replace the super-fast integer additions with the slower 2dim-array indexing.

EXPLANATION OF HOW IT WORKS

Construction of the c array above makes use of a numpy broadcasting trick. Adding an array of shape (N,) and array of shape (1,N) broadcast both to be (N,N)-like, thus the result is an NxN array of all possible sums. Then, we clip it. We get a 2dim array that satisfies: c[i,j]=min(i+j,255) for each i,j.

Then what's left is using fancy indexing the grab the right values. Working with the input you provided, we access:

c[( [100, 200, 250] , [50, 50, 50] )]

The first index-array refers to the 1st dim, and the second to the 2nd dim. Thus the result is an array of the same shape as the index arrays ((N,)), consisting of the values [ c[100,50] , c[200,50] , c[250,50] ].

Converting numpy array to pure python integer to avoid integer overflow

Using astype(int) seems to be working fine; the following code:

import numpy as np

test = np.array([True, False, False, False, True, True, True, False, True, True, False, False, True, True, True, False, True, False, False, True, False, True, True, True, True, True, False, True, False, True, True, False, True, True, False, True, False, False, True, False, True, True, False, True, False, True, True, False, True, True, True, False, False, False, True, False, False, True, True, True, True, False, True, False])
test_int = test.astype(int)

print(test_int)
print(test_int.sum())

Returns:

[1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 0 1 1 0
1 1 0 1 0 0 1 0 1 1 0 1 0 1 1 0 1 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0]

37

The overflow exception you are getting seems unlikely here so I would look again into that because maybe you had an error somewhere else.

Edit

If you want to get a Python type instead of a numpy object just do:

test.astype(int).tolist()

Python numpy arrays staying integers

I believe that Python is doing the operations as floats but since the arrays are dtype=int64, they are being cast back to integers in order to store them.

As @Ynjxsjmh comment mentions, you can explicitly set the dtype of the arrays to float at specification:

a = np.array([[8, 4, -1], [-2, 5, 1], [2, -1, 6]], dtype='float64')
b = np.array([[11], [4], [7]], dtype='float64')

Alternatively, since all elements in an array have the same type, you can just force it by adding a point to one of the numbers:

a = np.array([[8., 4, -1], [-2, 5, 1], [2, -1, 6]])
b = np.array([[11.], [4], [7]])

Integer overflow while calculating all possible sums of n*m matrix rows

The source of the negative value is coming from integer overflows. If you want to prevent overflows, you should use sufficiently big integers. Beyond 64 bits, Numpy only support unbounded Python integers (which are not very fast). You can enable this with dtype=object.

Here the corrected code:

import numpy as np
data = np.array([[1999999999999999998,2999999999999999998,3999999999999999998],[4999999999999999998,5999999999999999998,6999999999999999998]], dtype=object)
print(data)
div = int(data.shape[0])
row_len_squared = int(data.shape[1]**2)

firstPossibleSumsArray = np.empty(int((div * (div - 1)) / 2 * row_len_squared),
                                  dtype=object)

idx = 0
for row in range(div):
    for col in range(row + 1, div):
        firstPossibleSumsArray[idx:idx+row_len_squared] = \
                  (data[row,:,np.newaxis] + data[col]).flatten()
        idx += row_len_squared
#reapeat process for second possible sums array by replacing the range
#in the first loop from range(div) to range(div,2*div)
print(firstPossibleSumsArray)

Output:

[[1999999999999999998 2999999999999999998 3999999999999999998]
 [4999999999999999998 5999999999999999998 6999999999999999998]]
[6999999999999999996 7999999999999999996 8999999999999999996
 7999999999999999996 8999999999999999996 9999999999999999996
 8999999999999999996 9999999999999999996 10999999999999999996]

Integer Overflow in Numpy Arrays