pandas: multiple conditions while indexing data frame - unexpected behavior
As you can see, the AND operator drops every row in which at least one
value equals -1. On the other hand, the OR operator requires both
values to be equal to -1 to drop them.
That's right. Remember that you're writing the condition in terms of what you want to keep, not in terms of what you want to drop. For df1
:
df1 = df[(df.a != -1) & (df.b != -1)]
You're saying "keep the rows in which df.a
isn't -1 and df.b
isn't -1", which is the same as dropping every row in which at least one value is -1.
For df2
:
df2 = df[(df.a != -1) | (df.b != -1)]
You're saying "keep the rows in which either df.a
or df.b
is not -1", which is the same as dropping rows where both values are -1.
PS: chained access like df['a'][1] = -1
can get you into trouble. It's better to get into the habit of using .loc
and .iloc
.
Pandas multiple condition and get dataframe
Use &
instead of and
and put brackets around each value test:
df_result = df[(df.a == 1) & (df.b == 0) & (df.c == 0) & (df.d == 0)]
Alternatively, to avoid using extra brackets, you can use .eq()
:
df_result = df[df.a.eq(1) & df.b.eq(0) & df.c.eq(0) & df.d.eq(0)]
Pandas if statement does not work when passing through multiple conditions
Consider this dataframe:
# df = pd.DataFrame(np.random.choice([np.NaN, 0, 1], (10, 2)),
# columns=['abc_1', 'abc_2'])
>>> df
abc_1 abc_2
0 1.0 NaN
1 0.0 1.0
2 0.0 0.0
3 NaN NaN
4 NaN 1.0
5 0.0 0.0
6 0.0 1.0
7 NaN 1.0
8 1.0 0.0
9 1.0 0.0
Your function:
def abc(row):
abc_1 = row['abc_1'] # <- HERE the column name
abc_2 = row['abc_2'] # <- HERE the column name
if abc_1>0 and abc_2>0:
return 'abc 1 and 2'
elif abc_1>0 and abc_2 ==0:
return 'abc_1 only'
elif abc_1 ==0 and abc_2>0:
return 'abc_2 only'
else: return 'No abc'
The output:
>>> df.apply(abc, axis='columns')
0 No abc
1 abc_2 only
2 No abc
3 No abc
4 No abc
5 No abc
6 abc_2 only
7 No abc
8 abc_1 only
9 abc_1 only
dtype: object
Alternative output filling NaN
by 0:
>>> df.fillna(0).apply(abc, axis='columns')
0 abc_1 only
1 abc_2 only
2 No abc
3 No abc
4 abc_2 only
5 No abc
6 abc_2 only
7 abc_2 only
8 abc_1 only
9 abc_1 only
dtype: object
Pandas Conditional filter on same column
For compare missing values need Series.notna
and because priority operators add ()
in next conditions, for bitwise AND
use &
:
df[df['Column'].notna() & (df['Column']!='value_x') & (df['Column']!='value_y')]
Another out of box solution is replace missing values, test if value_x
or value_y
and invert condition by ~
:
df[~df['Column'].fillna('value_x').isin(['value_x','value_y'])]
Combine mutually exclusive arguments in filter condition
This is more fuzzy but you could just use regex match like.
df[df.columns[df.columns.str.contains('einkst_l|name|year')]]
Also, could use ^ or $ to make match exactly for name and year.
Pandas filter using multiple conditions and ignore entries that contain duplicates of substring
According your expected output, you want to remove duplicates but keep first item:
df.Asset = df.Asset.str.replace(r"(DAI|USDT|BUSD|TUSD)$", "")
df = df.drop_duplicates(subset="Asset", keep="first")
print(df)
Prints:
Asset Price
0 1INCH 5.743400
2 AAVE 365.002000
3 AAVEDOWN 2.025052
4 AAVEUP 81.895000
6 ACM 10.917000
8 ADA 1.214390
9 ADADOWN 3.464827
11 ADAUP 76.129000
13 AERGO 0.430120
14 AION 0.072100
EDIT: To group and average:
df.Asset = df.Asset.str.replace(r"(DAI|USDT|BUSD|TUSD)$", "")
df = df.groupby("Asset")["Price"].mean().reset_index()
print(df)
Prints:
Asset Price
0 1INCH 5.741950
1 AAVE 365.289000
2 AAVEDOWN 2.025052
3 AAVEUP 81.895000
4 ACM 10.906000
5 ADA 1.213723
6 ADADOWN 3.464827
7 ADAUP 76.129000
8 AERGO 0.430120
9 AION 0.072100
Filtering dataframes in pandas, how to chain multiple filters?
df = data[(data['change'] >= 10) & (open <= 15) & (open >= 1)]
or using pandas.DataFrame.query
:
df = data.query("1 <= open <= 15 and change >= 10")
Related Topics
String Replace Doesn't Appear to Be Working
Why Does the Print Function Return None
Differencebetween an Expression and a Statement in Python
Python Nameerror: Name Is Not Defined
Check for Presence of a Sliced List in Python
Pandas Dataframe Line Plot Display Date on Xaxis
How to Convert a Pil Image into a Numpy Array
Can You Add New Statements to Python's Syntax
What's the Fastest Way of Checking If a Point Is Inside a Polygon in Python
Rolling Window for 1D Arrays in Numpy
How to Jump to a Particular Line in a Huge Text File
Catch a Thread's Exception in the Caller Thread
How to Find the Exact Intersection of a Curve (As Np.Array) with Y==0