Comparing numpy arrays containing NaN
Alternatively you can use numpy.testing.assert_equal
or numpy.testing.assert_array_equal
with a try/except
:
In : import numpy as np
In : def nan_equal(a,b):
...: try:
...: np.testing.assert_equal(a,b)
...: except AssertionError:
...: return False
...: return True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([1, 2, np.NaN])
In : nan_equal(a,b)
Out: True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([3, 2, np.NaN])
In : nan_equal(a,b)
Out: False
Edit
Since you are using this for unittesting, bare assert
(instead of wrapping it to get True/False
) might be more natural.
Python\Numpy: Comparing arrays with NAN
Since a
and b
are lists, a == b
isn't returning an array, and so your numpy-like logic won't work:
>>> a == b
False
The command you've quoted only works if they're arrays:
>>> a,b = np.asarray(a), np.asarray(b)
>>> a == b
array([ True, False], dtype=bool)
>>> (a == b) | (np.isnan(a) & np.isnan(b))
array([ True, True], dtype=bool)
>>> ((a == b) | (np.isnan(a) & np.isnan(b))).all()
True
which should work to compare two arrays (either they're both equal or they're both NaN).
How to compare two numpy arrays with some NaN values?
You can use masked arrays, which have the behaviour you're asking for when combined with np.all
:
zm = np.ma.masked_where(np.isnan(z), z)
np.all(x == zm) # returns True
np.all(y == zm) # returns False
Or you could just write out your logic explicitly, noting that numpy has to use |
instead of or
, and the difference in operator precedence that results:
def func(a, b):
return np.all((a == b) | np.isnan(a) | np.isnan(b))
How to compare numpy arrays ignoring nans?
Use np.allclose
and np.isnan
:
mask = ~(np.isnan(a) | np.isnan(b))
np.allclose(a[mask], b[mask])
This correctly handles +/- inf
and allows for small differences. Absolute and relative tolerances can be specified as parameters to allclose
.
Comparing NumPy arrays so that NaNs compare equal
If you really care about memory use (e.g. have very large arrays), then you should use numexpr and the following expression will work for you:
np.all(numexpr.evaluate('(a==b)|((a!=a)&(b!=b))'))
I've tested it on very big arrays with length of 3e8, and the code has the same performance on my machine as
np.all(a==b)
and uses the same amount of memory
inequality comparison of numpy array with nan to a scalar
Any comparison (other than !=
) of a NaN to a non-NaN value will always return False:
>>> x < -1000
array([False, False, False, True, False, False], dtype=bool)
So you can simply ignore the fact that there are NaNs already in your array and do:
>>> x[x < -1000] = np.nan
>>> x
array([ nan, 1., 2., nan, nan, 5.])
EDIT I don't see any warning when I ran the above, but if you really need to stay away from the NaNs, you can do something like:
mask = ~np.isnan(x)
mask[mask] &= x[mask] < -1000
x[mask] = np.nan
Compare two unequal size numpy arrays and fill the exclusion elements with nan
Here is my solution assuming the first array is always bigger than the second (see comments for general solution, e.g for the second array is bigger on some dimension)
import numpy as np
a = np.arange(18).reshape(6, 3) # 6x3 array
b = np.arange(4).reshape(2, 2) # 2x2 array
# create a resulting array of `nan` values
# in general case, desired shape is
# np.max([a.shape, b.shape], axis=0)
result = np.full(a.shape, np.nan)
# our selection have a shape of the smaller array
# in general case:
# tuple(map(slice, np.min([a.shape, b.shape], axis=0)))
selection = (slice(b.shape[0]), slice(b.shape[1]))
# compare values according the selection
result[selection] = a[selection] == b[selection]
NaNs comparing equal in Numpy
On newer versions of numpy you get this warning:
FutureWarning: numpy equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
my guess is that numpy is using id
test as a shortcut, for object
types before falling back to __eq__
test, and since
>>> id(np.nan) == id(np.nan)
True
it returns true.
if you use float('nan')
instead of np.nan
the result would be different:
>>> a = np.array([np.nan], dtype=object)
>>> b = np.array([float('nan')], dtype=object)
>>> a == b
array([False], dtype=bool)
>>> id(np.nan) == id(float('nan'))
False
Related Topics
Converting List of Tuples into a Dictionary
How to Convert a Python Datetime.Datetime to Excel Serial Date Number
Compare Two CSV Files and Search for Similar Items
How to Check If All Items in a List Are There in Another List
Python 'If X Is Not None' or 'If Not X Is None'
How to Access a File's Properties on Windows
Execute a Function After Flask Returns Response
Sql-Like Window Functions in Pandas: Row Numbering in Python Pandas Dataframe
General Unicode/Utf-8 Support for CSV Files in Python 2.6
Hitting Maximum Recursion Depth Using Pickle/Cpickle
Difference Between Parsing a Text File in R and Rb Mode
Rolling Mean on Pandas on a Specific Column
Dataframe Set_Index Not Setting
How to Use Python to Get the System Hostname
How to Remove the Top and Right Axis in Matplotlib
How Come a File Doesn't Get Written Until I Stop the Program