Why Do "Not a Number" Values Equal True When Cast as Boolean in Python/Numpy

Why do Not a Number values equal True when cast as boolean in Python/Numpy?

This is in no way NumPy-specific, but is consistent with how Python treats NaNs:

In [1]: bool(float('nan'))
Out[1]: True

The rules are spelled out in the documentation.

I think it could be reasonably argued that the truth value of NaN should be False. However, this is not how the language works right now.

How can I convert boolean values from np.diff() to the actual values

Hopefully, this answers your question:

# Gets the difference between consecutive elements.
exact_diffs = np.diff(array, n=1)

# true_diffs = exact values in exact_diffs that are > 1.
# false_diffs = exact values in exact_diffs that are <= 1.
true_diffs = exact_diffs[exact_diffs > 1]
false_diffs = exact_diffs[exact_diffs <= 1]

# true_indices = indices of values in exact_diffs that are > 1.
# false_indices = indices of values in exact_diffs that are <= 1.
true_indices = np.where(exact_diffs > 1)[0]
false_indices = np.where(exact_diffs <= 1)[0]

# Note that true_indices gets the corresponding values in exact_diffs.
# To get the indices of values in array that are at least 1 more than preceding element, do:
true_indices += 1

# and then you can index those values in array with array[true_indices].

Boolean identity == True vs is True

If you want to determine whether a value is exactly True (not just a true-like value), is there any reason to use if foo == True rather than if foo is True?

If you want to make sure that foo really is a boolean and of value True, use the is operator.

Otherwise, if the type of foo implements its own __eq__() that returns a true-ish value when comparing to True, you might end up with an unexpected result.

As a rule of thumb, you should always use is with the built-in constants True, False and None.

Does this vary between implementations such as CPython (2.x and 3.x), Jython, PyPy, etc.?

In theory, is will be faster than == since the latter must honor types' custom __eq__ implementations, while is can directly compare object identities (e.g., memory addresses).

I don't know the source code of the various Python implementations by heart, but I assume that most of them can optimize that by using some internal flags for the existence of magic methods, so I suspect that you won't notice the speed difference in practice.

Why is the result of pandas.Series([numpy.nan]).astype(bool) is True?

Probably because np.nan objects are themselves truthy:

>>> bool(np.nan)
True

Multiplying integers by booleans and understanding numpy array comparison

NumPy will cast the bool type to the integer type, with False and True converted to 0 and 1 respectively. This casting is safe, so don't worry, be happy.

In [8]: np.can_cast(np.bool8, np.intc)
Out[8]: True

If you prefer to be explicit, you could do that casting yourself by replacing (a2 == b) with (a2 == b).astype(int), but that is not necessary.

python numpy strange boolean arithmetic behaviour

Its all about operator order and data types.

>>> import numpy as np
>>> B = np.array([0, 1], dtype=np.bool)
>>> B
array([False,  True], dtype=bool)

With numpy, boolean arrays are treated as that, boolean arrays. Every operation applied to them, will first try to maintain the data type. That is way:

>>> -B
array([ True, False], dtype=bool)

and

>>> ~B
array([ True, False], dtype=bool)

which are equivalent, return the element-wise negation of its elements. Note however that using -B throws a warning, as the function is deprecated.

When you use things like:

>>> B + 1
array([1, 2])

B and 1 are first casted under the hood to the same data type. In data-type promotions, the boolean array is always casted to a numeric array. In the above case, B is casted to int, which is similar as:

>>> B.astype(int) + 1
array([1, 2])

In your example:

>>> -B * 2
array([2, 0])

First the array B is negated by the operator - and then multiplied by 2. The desired behaviour can be adopted either by explicit data conversion, or adding brackets to ensure proper operation order:

>>> -(B * 2)
array([ 0, -2])

>>> -B.astype(int) * 2
array([ 0, -2])

Note that B.astype(int) can be replaced without data-copy by B.view(np.int8), as boolean are represented by characters and have thus 8 bits, the data can be viewed as integer with the .view method without needing to convert it.

>>> B.view(np.int8)
array([0, 1], dtype=int8)

So, in short, B.view(np.int8) or B.astype(yourtype) will always ensurs that B is a [0,1] numeric array.

How can I map True/False to 1/0 in a Pandas DataFrame?

A succinct way to convert a single column of boolean values to a column of integers 1 or 0:

df["somecolumn"] = df["somecolumn"].astype(int)

Why is NaN not considered falsy in python?

It's not falsy because it's a valid string argument for float. You can find more information in the documentation.
https://docs.python.org/3/library/functions.html?highlight=float#float

If the argument is a string, it should contain a decimal number, optionally preceded by a sign, and optionally embedded in whitespace. The optional sign may be '+' or '-'; a '+' sign has no effect on the value produced. The argument may also be a string representing a NaN (not-a-number), or a positive or negative infinity.

How to convert 'false' to 0 and 'true' to 1?

Use int() on a boolean test:

x = int(x == 'true')

int() turns the boolean into 1 or 0. Note that any value not equal to 'true' will result in 0 being returned.

Given the X numpy array, return True if any of its elements is zero

X.any() is an incorrect answer, it would fail on X = np.array([0]) for instance (incorrectly returning False).

A correct answer would be: ~X.all(). According to De Morgan's laws, ANY element is 0 is equivalent to NOT (ALL elements are (NOT 0)).

How does it work?

Numpy is doing a implicit conversion to boolean:

X = np.array([-1, 2, 0, -4, 5, 6, 0, 0, -9, 10])
# array([-1, 2, 0, -4, 5, 6, 0, 0, -9, 10])

# convert to boolean
# 0 is False, all other numbers are True
X.astype(bool)
# array([ True,  True, False,  True,  True,  True, False, False,  True, True])

# are all values truthy (not 0 in this case)?
X.astype(bool).all()
# False

# get the boolean NOT
~X.astype(bool).all()
# True

Why Do "Not a Number" Values Equal True When Cast as Boolean in Python/Numpy