Pandas: How do I assign values based on multiple conditions for existing columns?
You can do this using np.where
, the conditions use bitwise &
and |
for and
and or
with parentheses around the multiple conditions due to operator precedence. So where the condition is true 5
is returned and 0
otherwise:
In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df
Out[29]:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
Assign value of existing column to new columns in pandas based on multiple conditions
From your DataFrame
:
>>> import pandas as pd
>>> from io import StringIO
>>> df = pd.read_csv(StringIO("""
... column1,column2,column3,y1,y2,y3
... 100,200,300,2020,2021,2022
... 100,200,300,2021,2022,2023
... 100,200,300,2019,2020,2021"""))
>>> df
column1 column2 column3 y1 y2 y3
0 100 200 300 2020 2021 2022
1 100 200 300 2021 2022 2023
2 100 200 300 2019 2020 2021
And the function assignvalues
, which now return the value from the expected column for each if
. We set the currentyear
at 2021
for example :
>>> def assignvalues(df):
... if df['y1'] == currentyear:
... return df['column1']
... elif df['y2'] == currentyear:
... return df['column2']
... elif df['y3'] == currentyear:
... return df['column3']
>>> currentyear = 2021
We can assign to df["Vals"]
an apply()
, as you did, with an axis=1
parameter to get the expected result :
>>> df["Vals"] = df.apply(assignvalues, axis=1)
>>> df
column1 column2 column3 y1 y2 y3 Vals
0 100 200 300 2020 2021 2022 200
1 100 200 300 2021 2022 2023 100
2 100 200 300 2019 2020 2021 300
change column value based on multiple conditions
You are really close, assign value Matt
to filtered A
by boolean masks:
df.loc[(df['A']=='Harry') & (df['B']=='George') & (df['C']>'2019'),'A'] = 'Matt'
Assign numeric values for multiple columns based on multiple conditions in pandas DataFrame
You could apply
pd.cut
to the relevant columns:
cols = ['Procedures1', 'Procedures2']
df[cols] = df[cols].apply(lambda col: pd.cut(col, [0,200,500,1000, col.max()], labels=[1,2,3,4]))
Output:
Therapy_area Procedures1 Procedures2
0 Oncology 2 2
1 Oncology 2 2
2 Oncology 1 1
3 Oncology 3 3
4 Oncology 4 4
5 Oncology 4 4
6 Nononcology 2 2
7 Nononcology 2 2
8 Nononcology 2 2
9 Nononcology 1 1
You could also use np.select
:
def encoding(col, labels):
return np.select([col<200, col.between(200,500), col.between(500,1000), col>1000], labels, 0)
onc_labels = [1,2,3,4]
nonc_labels = [11,22,33,44]
msk = df['Therapy_area'] == 'Oncology'
df[cols] = pd.concat((df.loc[msk, cols].apply(encoding, args=(onc_labels,)), df.loc[msk, cols].apply(encoding, args=(nonc_labels,)))).reset_index(drop=True)
Output:
Therapy_area Procedures1 Procedures2 Procedures3
0 Oncology 2 2 4
1 Oncology 2 2 2
2 Oncology 1 1 4
3 Oncology 3 3 2
4 Oncology 4 4 1
5 Oncology 4 4 2
6 Nononcology 22 22 44
7 Nononcology 22 22 22
8 Nononcology 11 11 44
9 Nononcology 33 33 22
Pandas - Assign value to subset of dataframe, based on multiple conditions
Use isin
and map
:
df.loc[df['Market'].isin(['Mk 1', 'Mk1']), 'Sub Market'] = df['Symbol'].isin(dct).map({True:'A', False:'B'})
Output:
>>> df
Market Sub Market Symbol
0 Mk1 A ABC
1 Mk 1 A ABC
2 Mk 1 B 123
3 Mk 2 B 123
4 Mk 3 A XYZ
Pandas - Trying to assign values to dataframe based on multiple conditions
We need two conditions
df.loc[df['field1'].isnull() & df['field3'].isnull(), 'fieldTemp'] = 0
How to set values of a column based on multiple conditions in other columns in python?
You're missing parenthesis when defining the conditions. The reason behind this is that bitwise operators have higher precedence than comparissons. Instead use:
m1 = (df.col1 >= 1) & (df.col2 >= 1) & (df.col3 >= 1) &
(df.col4 >= 1) & (df.col5 >= 1)
m2 = (df.col2 >= 1) & (df.col3 >= 1) & (df.col4 >= 1) & (df.col5 >= 1)
m3 = (df.col3 >= 1) & (df.col4 >= 1) & (df.col5 >= 1)
df['category'] = np.select([m1, m2, m3], ['certain', 'possible', 'probable'],
default='Other')
Which results in the expected output:
col1 col2 col3 col4 col5 category
0 1 1 1 4 1 certain
1 0 1 1 1 1 possible
2 0 0 1 1 1 probable
Use multiple conditions on a column to assign values of new column
There's no need for itterrows
here, which is bad practice and considered slow.
Method 1 pd.cut
df['B'] = pd.cut(df['A'], [0,1,4,10], labels=['low', 'mid', 'high'])
A B
0 1 low
1 1 low
2 2 mid
3 3 mid
4 5 high
5 4 mid
6 2 mid
7 5 high
Method 2 np.select
conditions = [
df['A'] == 1,
df['A'].isin([2, 3, 4])
]
choices = ['low', 'mid']
df['B'] = np.select(conditions, choices, default='high')
A B
0 1 low
1 1 low
2 2 mid
3 3 mid
4 5 high
5 4 mid
6 2 mid
7 5 high
Assign a dataframe column a value, based on multiple conditions
We can use cut
transform(House, newcol = cut(price, breaks = c(-Inf, 300000, 500000, Inf),
labels = c("red", "blue", "green")))
# price newcol
#1 287655 red
#2 456355 blue
#3 662500 green
#4 597864 green
#5 876545 green
Note that if/else
is not vectorized and it expects the input to have length
of 1. If we are doing in this a loop with each element having length
1, it works, but it is also inefficient as there is ifelse
vectorized version of if/else
House <- transform(House, newcol = ifelse(price < 300000, "red",
ifelse(price > 300000 & price < 500000, "blue", "green")))
House
# price newcol
#1 287655 red
#2 456355 blue
#3 662500 green
#4 597864 green
#5 876545 green
If we look at the results, both of them got the same output, but the difference is in the number of ifelse
statements which can increase when there are more number of comparisons. It would be better to use cut
or findInterval
instead of nested ifelse
if
goes with else
rather than then
House$newcol <- NA
for(i in seq_len(nrow(House))) {
House$newcol[i] <- if(House$price[i] < 300000) {
'red'
} else if( House$price[i] > 300000 & House$price[i] < 500000) {
'blue'
} else 'green'
}
Related Topics
Receiving Integers from the User Until They Enter 0
Python Pandas Dataframe Get All Combinations of Column Values
Easiest Way to Ignore Blank Lines When Reading a File in Python
Importing Local Module (Python Script) in Airflow Dag
Python Json.Loads Valueerror, Expecting Delimiter
Filtering a Pyspark Dataframe Using Isin by Exclusion
Python List from Database Table into a Dictionary
Regular Expression to Check Whitespace in the Beginning and End of a String
How to Remove Nan from List Python/Numpy
How to Find Rows of One Dataframe in Another Dataframe
How to Read Numbers from File in Python
Counting the No. of Black to White Pixels in the Image Using Opencv