How can I obtain the element-wise logical NOT of a pandas Series?
To invert a boolean Series, use ~s
:
In [7]: s = pd.Series([True, True, False, True])
In [8]: ~s
Out[8]:
0 False
1 False
2 True
3 False
dtype: bool
Using Python2.7, NumPy 1.8.0, Pandas 0.13.1:
In [119]: s = pd.Series([True, True, False, True]*10000)
In [10]: %timeit np.invert(s)
10000 loops, best of 3: 91.8 µs per loop
In [11]: %timeit ~s
10000 loops, best of 3: 73.5 µs per loop
In [12]: %timeit (-s)
10000 loops, best of 3: 73.5 µs per loop
As of Pandas 0.13.0, Series are no longer subclasses of numpy.ndarray
; they are now subclasses of pd.NDFrame
. This might have something to do with why np.invert(s)
is no longer as fast as ~s
or -s
.
Caveat: timeit
results may vary depending on many factors including hardware, compiler, OS, Python, NumPy and Pandas versions.
Element-wise logical AND on indeterminate number of Pandas Series
Assuming I understand you, you can use logical_and.reduce
. Starting from a list of Series:
>>> ss = [pd.Series([ True, False, True, False, True]), pd.Series([False, True, True, False, False]), pd.Series([False, False, True, False, True]), pd.Series([False, True, True, False, False]), pd.Series([ True, True, True, True, False])]
which would look like
>>> pd.DataFrame(ss)
0 1 2 3 4
0 True False True False True
1 False True True False False
2 False False True False True
3 False True True False False
4 True True True True False
[5 rows x 5 columns]
if it were a dataframe, you can reduce across the columns:
>>> np.logical_and.reduce(ss)
array([False, False, True, False, False], dtype=bool)
or pass axis=1
if you want the other direction.
Remember that you can also use any
and all
, e.g.
>>> df = pd.DataFrame(ss)
>>> df.all()
0 False
1 False
2 True
3 False
4 False
dtype: bool
Logical operators for Boolean indexing in Pandas
When you say
(a['x']==1) and (a['y']==10)
You are implicitly asking Python to convert (a['x']==1)
and (a['y']==10)
to Boolean values.
NumPy arrays (of length greater than 1) and Pandas objects such as Series do not have a Boolean value -- in other words, they raise
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
when used as a Boolean value. That's because it's unclear when it should be True or False. Some users might assume they are True if they have non-zero length, like a Python list. Others might desire for it to be True only if all its elements are True. Others might want it to be True if any of its elements are True.
Because there are so many conflicting expectations, the designers of NumPy and Pandas refuse to guess, and instead raise a ValueError.
Instead, you must be explicit, by calling the empty()
, all()
or any()
method to indicate which behavior you desire.
In this case, however, it looks like you do not want Boolean evaluation, you want element-wise logical-and. That is what the &
binary operator performs:
(a['x']==1) & (a['y']==10)
returns a boolean array.
By the way, as alexpmil notes,
the parentheses are mandatory since &
has a higher operator precedence than ==
.
Without the parentheses, a['x']==1 & a['y']==10
would be evaluated as a['x'] == (1 & a['y']) == 10
which would in turn be equivalent to the chained comparison (a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10)
. That is an expression of the form Series and Series
.
The use of and
with two Series would again trigger the same ValueError
as above. That's why the parentheses are mandatory.
Element-wise logical OR in Pandas
The corresponding operator is |
:
df[(df < 3) | (df == 5)]
would elementwise check if value is less than 3 or equal to 5.
If you need a function to do this, we have np.logical_or
. For two conditions, you can use
df[np.logical_or(df<3, df==5)]
Or, for multiple conditions use the logical_or.reduce
,
df[np.logical_or.reduce([df<3, df==5])]
Since the conditions are specified as individual arguments, parentheses grouping is not needed.
More information on logical operations with pandas can be found here.
Finding element-wise closest match of a series with respect to values of a second series and the locations (index) of these closest matches
So here is one way using numpy
broadcast
A.iloc[np.abs(B.values-A.values[:,None]).argmin(axis=0)]
0 1.0
4 5.0
2 10.0
0 1.0
4 5.0
dtype: float64
And here is the fix adding drop_duplicates
pd.Series(A.values, A.values).sort_index().drop_duplicates().reindex(B.values, method='nearest')
0.8 1.0
5.1 5.0
10.1 10.0
0.3 1.0
5.5 5.0
dtype: float64
Elementwise logical operation on character with pandas data frame
I'd consider the following bit more generic approach:
In [238]: df.astype(str).applymap(ord).sub(ord('@')).replace(-16,0)
Out[238]:
1 2 3
0 3 3 1
1 0 4 2
2 5 0 1
where:
0 - 0
1 - A
2 - B
3 - C
...
Related Topics
How to Increase the Cell Width of the Jupyter/Ipython Notebook in My Browser
How to Quantify Difference Between Two Images
How to Update/Upgrade Pip Itself from Inside My Virtual Environment
How to Count the Occurrence of a Certain Item in an Ndarray
Replace() Method Not Working on Pandas Dataframe
Df.Append() Is Not Appending to the Dataframe
Error When Configuring Tkinter Widget: 'Nonetype' Object Has No Attribute
Difference Between Filter and Filter_By in SQLalchemy
Is There a Math Ncr Function in Python
Python Function Attributes - Uses and Abuses
Why Are Empty Strings Returned in Split() Results
Why Is Parenthesis in Print Voluntary in Python 2.7
How to Properly Round-Up Half Float Numbers
How to Make an Exe File from a Python Program
How to Manually Install a Pypi Module Without Pip/Easy_Install
How to Get the Ip Address from a Nic (Network Interface Controller) in Python