Why in Numpy 'Nan == Nan' Is False While Nan in [Nan] Is True

Why in numpy `nan == nan` is False while nan in [nan] is True?

nan not being equal to nan is part of the definition of nan, so that part's easy.

As for nan in [nan] being True, that's because identity is tested before equality for containment in lists. You're comparing the same two objects.

If you tried the same thing with two different nans, you'd get False:

>>> nans = [float("nan") for i in range(2)]
>>> map(id, nans)
[190459300, 190459284]
>>> nans
[nan, nan]
>>> nans[0] is nans[1]
False
>>> nans[0] in nans
True
>>> nans[0] in nans[1:]
False

Your addendum doesn't really have much to do with nan, that's simply how Python works. Once you understand that float("nan") is under no obligation to return some nan singleton, and that y = x doesn't make a copy of x but instead binds the name y to the object named by x, there's nothing left to get.

Why does assert np.nan == np.nan cause an error?

NaN has the property that it doesn't equal itself, you should use np.isnan to test NaN values, here np.isnan(np.nan) will yield True:

In[5]:
np.nan == np.nan

Out[5]: False

In[6]:
np.nan != np.nan

Out[6]: True

In[7]:
np.isnan(np.nan)

Out[7]: True

Why does comparing to nan yield False (Python)?

The creators of numpy decided that it made most sense that most comparisons to nan, including ==, should yield False. You can do this in Python by defining a __eq__(self, other) method for your object. This behaviour was chosen simply because it is the most useful, for various purposes. After all, the fact that one entry has a missing value, and another entry also has a missing value, does not imply that those two entries are equal. It just implies that you don't know whether they are equal or not, and it's therefore best not to treat them as if they are (e.g. when you join two tables together by pairing up corresponding rows).

is on the other hand is a Python keyword which cannot be overwritten by numpy. It tests whether two objects are the same thing. nan is the same object as nan. This is also useful behaviour to have anyway, because often you will want to e.g. get rid of all entries which don't have a value, which you can achieve with is not nan.

nan in (nan,) returns True because as you probably know, (nan,) is a tuple with only one element, nan, and when Python checks if an object is in a tuple, it is checking whether that object is or == any object in the tuple.

numpy NaN not always recognized

This isn't so much a question about the Python is operator, as about what indexing, or unboxing, an element of an array does:

In [363]: a=np.array([1,2,np.nan,3])
In [364]: a[2]
Out[364]: nan
In [365]: type(a[2])
Out[365]: numpy.float64
In [366]: a[2] is a[2]
Out[366]: False

a[2] doesn't simply return nan. It returns a np.float64 object whose values is np.nan. Another a[2] will produce another np.float64 object. Two such objects don't match in the is sense. That's true for any array element, not just nan values.

Since == doesn't work for nan, we are stuck with using the np.isnan function.

np.nan is a unique float object (in this session), but a[2] is not set to that object.

If the array was defined as an object type:

In [376]: b=np.array([1,2,np.nan,3], object)
In [377]: b[2] is np.nan
Out[377]: True

here the is is True - because b contains pointers to objects that already exist in memory, including the np.nan object. Same would be true for a list constructed like that.



Related Topics



Leave a reply



Submit