Pandas: Multiple Conditions While Indexing Data Frame - Unexpected Behavior

pandas: multiple conditions while indexing data frame - unexpected behavior

As you can see, the AND operator drops every row in which at least one
value equals -1. On the other hand, the OR operator requires both
values to be equal to -1 to drop them.

That's right. Remember that you're writing the condition in terms of what you want to keep, not in terms of what you want to drop. For df1:

df1 = df[(df.a != -1) & (df.b != -1)]

You're saying "keep the rows in which df.a isn't -1 and df.b isn't -1", which is the same as dropping every row in which at least one value is -1.

For df2:

df2 = df[(df.a != -1) | (df.b != -1)]

You're saying "keep the rows in which either df.a or df.b is not -1", which is the same as dropping rows where both values are -1.

PS: chained access like df['a'][1] = -1 can get you into trouble. It's better to get into the habit of using .loc and .iloc.

Pandas multiple condition and get dataframe

Use & instead of and and put brackets around each value test:

df_result = df[(df.a == 1) & (df.b == 0) & (df.c == 0) & (df.d == 0)]

Alternatively, to avoid using extra brackets, you can use .eq():

df_result = df[df.a.eq(1) & df.b.eq(0) & df.c.eq(0) & df.d.eq(0)]

Pandas if statement does not work when passing through multiple conditions

Consider this dataframe:

# df = pd.DataFrame(np.random.choice([np.NaN, 0, 1], (10, 2)),
# columns=['abc_1', 'abc_2'])
>>> df
abc_1 abc_2
0 1.0 NaN
1 0.0 1.0
2 0.0 0.0
3 NaN NaN
4 NaN 1.0
5 0.0 0.0
6 0.0 1.0
7 NaN 1.0
8 1.0 0.0
9 1.0 0.0

Your function:

def abc(row):
abc_1 = row['abc_1'] # <- HERE the column name
abc_2 = row['abc_2'] # <- HERE the column name

if abc_1>0 and abc_2>0:
return 'abc 1 and 2'
elif abc_1>0 and abc_2 ==0:
return 'abc_1 only'
elif abc_1 ==0 and abc_2>0:
return 'abc_2 only'
else: return 'No abc'

The output:

>>> df.apply(abc, axis='columns')
0 No abc
1 abc_2 only
2 No abc
3 No abc
4 No abc
5 No abc
6 abc_2 only
7 No abc
8 abc_1 only
9 abc_1 only
dtype: object

Alternative output filling NaN by 0:

>>> df.fillna(0).apply(abc, axis='columns')
0 abc_1 only
1 abc_2 only
2 No abc
3 No abc
4 abc_2 only
5 No abc
6 abc_2 only
7 abc_2 only
8 abc_1 only
9 abc_1 only
dtype: object

Pandas Conditional filter on same column

For compare missing values need Series.notna and because priority operators add () in next conditions, for bitwise AND use &:

df[df['Column'].notna() & (df['Column']!='value_x') & (df['Column']!='value_y')]

Another out of box solution is replace missing values, test if value_x or value_y and invert condition by ~:

df[~df['Column'].fillna('value_x').isin(['value_x','value_y'])]

Combine mutually exclusive arguments in filter condition

This is more fuzzy but you could just use regex match like.

df[df.columns[df.columns.str.contains('einkst_l|name|year')]]

Also, could use ^ or $ to make match exactly for name and year.

Pandas filter using multiple conditions and ignore entries that contain duplicates of substring

According your expected output, you want to remove duplicates but keep first item:

df.Asset = df.Asset.str.replace(r"(DAI|USDT|BUSD|TUSD)$", "")
df = df.drop_duplicates(subset="Asset", keep="first")
print(df)

Prints:

       Asset       Price
0 1INCH 5.743400
2 AAVE 365.002000
3 AAVEDOWN 2.025052
4 AAVEUP 81.895000
6 ACM 10.917000
8 ADA 1.214390
9 ADADOWN 3.464827
11 ADAUP 76.129000
13 AERGO 0.430120
14 AION 0.072100

EDIT: To group and average:

df.Asset = df.Asset.str.replace(r"(DAI|USDT|BUSD|TUSD)$", "")
df = df.groupby("Asset")["Price"].mean().reset_index()
print(df)

Prints:

      Asset       Price
0 1INCH 5.741950
1 AAVE 365.289000
2 AAVEDOWN 2.025052
3 AAVEUP 81.895000
4 ACM 10.906000
5 ADA 1.213723
6 ADADOWN 3.464827
7 ADAUP 76.129000
8 AERGO 0.430120
9 AION 0.072100

Filtering dataframes in pandas, how to chain multiple filters?

df = data[(data['change'] >= 10) & (open <= 15) & (open >= 1)]

or using pandas.DataFrame.query:

df = data.query("1 <= open <= 15 and change >= 10")


Related Topics



Leave a reply



Submit