Why in numpy `nan == nan` is False while nan in [nan] is True?
nan
not being equal to nan
is part of the definition of nan
, so that part's easy.
As for nan in [nan]
being True, that's because identity is tested before equality for containment in lists. You're comparing the same two objects.
If you tried the same thing with two different nan
s, you'd get False:
>>> nans = [float("nan") for i in range(2)]
>>> map(id, nans)
[190459300, 190459284]
>>> nans
[nan, nan]
>>> nans[0] is nans[1]
False
>>> nans[0] in nans
True
>>> nans[0] in nans[1:]
False
Your addendum doesn't really have much to do with nan
, that's simply how Python works. Once you understand that float("nan")
is under no obligation to return some nan singleton, and that y = x
doesn't make a copy of x
but instead binds the name y
to the object named by x
, there's nothing left to get.
Why does assert np.nan == np.nan cause an error?
NaN
has the property that it doesn't equal itself, you should use np.isnan
to test NaN
values, here np.isnan(np.nan)
will yield True
:
In[5]:
np.nan == np.nan
Out[5]: False
In[6]:
np.nan != np.nan
Out[6]: True
In[7]:
np.isnan(np.nan)
Out[7]: True
Why does comparing to nan yield False (Python)?
The creators of numpy
decided that it made most sense that most comparisons to nan
, including ==
, should yield False
. You can do this in Python by defining a __eq__(self, other)
method for your object. This behaviour was chosen simply because it is the most useful, for various purposes. After all, the fact that one entry has a missing value, and another entry also has a missing value, does not imply that those two entries are equal. It just implies that you don't know whether they are equal or not, and it's therefore best not to treat them as if they are (e.g. when you join two tables together by pairing up corresponding rows).
is
on the other hand is a Python keyword which cannot be overwritten by numpy
. It tests whether two objects are the same thing. nan
is the same object as nan
. This is also useful behaviour to have anyway, because often you will want to e.g. get rid of all entries which don't have a value, which you can achieve with is not nan
.
nan in (nan,)
returns True because as you probably know, (nan,)
is a tuple with only one element, nan
, and when Python checks if an object is in
a tuple, it is checking whether that object is
or
==
any object in the tuple.
numpy NaN not always recognized
This isn't so much a question about the Python is
operator, as about what indexing, or unboxing, an element of an array does:
In [363]: a=np.array([1,2,np.nan,3])
In [364]: a[2]
Out[364]: nan
In [365]: type(a[2])
Out[365]: numpy.float64
In [366]: a[2] is a[2]
Out[366]: False
a[2]
doesn't simply return nan
. It returns a np.float64
object whose values is np.nan
. Another a[2]
will produce another np.float64
object. Two such objects don't match in the is
sense. That's true for any array element, not just nan
values.
Since ==
doesn't work for nan
, we are stuck with using the np.isnan
function.
np.nan
is a unique float
object (in this session), but a[2]
is not set to that object.
If the array was defined as an object type:
In [376]: b=np.array([1,2,np.nan,3], object)
In [377]: b[2] is np.nan
Out[377]: True
here the is
is True - because b
contains pointers to objects that already exist in memory, including the np.nan
object. Same would be true for a list constructed like that.
Related Topics
How to Escape Strings for SQLite Table/Column Names in Python
Pyinstaller and --Onefile: How to Include an Image in the Exe File
Getting a Callback When a Tkinter Listbox Selection Is Changed
If X:, VS If X == True, VS If X Is True
Python Multiprocessing on Windows, If _Name_ == "_Main_"
How to Convert a Given Ordinal Number (From Excel) to a Date
What Is Python Whitespace and How Does It Work
Parsing Datetime Strings Containing Nanoseconds
Do Python for Loops Work by Reference
Heatmap in Matplotlib with Pcolor
How to Properly Assert That an Exception Gets Raised in Pytest
Difference Between Numpy Dot() and Python 3.5+ Matrix Multiplication @
Databaseerror: Current Transaction Is Aborted, Commands Ignored Until End of Transaction Block
Share Large, Read-Only Numpy Array Between Multiprocessing Processes