Pandas: How to Assign Values Based on Multiple Conditions for Existing Columns

Pandas: How do I assign values based on multiple conditions for existing columns?

You can do this using np.where, the conditions use bitwise & and | for and and or with parentheses around the multiple conditions due to operator precedence. So where the condition is true 5 is returned and 0 otherwise:

In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df

Out[29]:
     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0

Assign value of existing column to new columns in pandas based on multiple conditions

From your DataFrame :

>>> import pandas as pd
>>> from io import StringIO

>>> df = pd.read_csv(StringIO("""
... column1,column2,column3,y1,y2,y3
... 100,200,300,2020,2021,2022
... 100,200,300,2021,2022,2023
... 100,200,300,2019,2020,2021"""))
>>> df
    column1 column2 column3 y1      y2      y3
0   100     200     300     2020    2021    2022
1   100     200     300     2021    2022    2023
2   100     200     300     2019    2020    2021

And the function assignvalues, which now return the value from the expected column for each if. We set the currentyear at 2021 for example :

>>> def assignvalues(df):
...     if df['y1'] == currentyear:
...         return df['column1']
...     elif df['y2'] == currentyear:
...         return df['column2']
...     elif df['y3'] == currentyear:
...         return df['column3']

>>> currentyear = 2021

We can assign to df["Vals"] an apply(), as you did, with an axis=1 parameter to get the expected result :

>>> df["Vals"] = df.apply(assignvalues, axis=1)
>>> df
    column1 column2 column3 y1      y2      y3      Vals
0   100     200     300     2020    2021    2022    200
1   100     200     300     2021    2022    2023    100
2   100     200     300     2019    2020    2021    300

change column value based on multiple conditions

You are really close, assign value Matt to filtered A by boolean masks:

df.loc[(df['A']=='Harry') & (df['B']=='George') & (df['C']>'2019'),'A'] = 'Matt'

Assign numeric values for multiple columns based on multiple conditions in pandas DataFrame

You could apply pd.cut to the relevant columns:

cols = ['Procedures1', 'Procedures2']
df[cols] = df[cols].apply(lambda col: pd.cut(col, [0,200,500,1000, col.max()], labels=[1,2,3,4]))

Output:

  Therapy_area Procedures1 Procedures2
0     Oncology           2           2
1     Oncology           2           2
2     Oncology           1           1
3     Oncology           3           3
4     Oncology           4           4
5     Oncology           4           4
6  Nononcology           2           2
7  Nononcology           2           2
8  Nononcology           2           2
9  Nononcology           1           1

You could also use np.select:

def encoding(col, labels):
    return np.select([col<200, col.between(200,500), col.between(500,1000), col>1000], labels, 0)

onc_labels = [1,2,3,4]
nonc_labels = [11,22,33,44]
msk = df['Therapy_area'] == 'Oncology'

df[cols] = pd.concat((df.loc[msk, cols].apply(encoding, args=(onc_labels,)), df.loc[msk, cols].apply(encoding, args=(nonc_labels,)))).reset_index(drop=True)

Output:

  Therapy_area  Procedures1  Procedures2  Procedures3
0     Oncology            2            2            4
1     Oncology            2            2            2
2     Oncology            1            1            4
3     Oncology            3            3            2
4     Oncology            4            4            1
5     Oncology            4            4            2
6  Nononcology           22           22           44
7  Nononcology           22           22           22
8  Nononcology           11           11           44
9  Nononcology           33           33           22

Pandas - Assign value to subset of dataframe, based on multiple conditions

Use isin and map:

df.loc[df['Market'].isin(['Mk 1', 'Mk1']), 'Sub Market'] = df['Symbol'].isin(dct).map({True:'A', False:'B'})

Output:

>>> df
  Market Sub Market Symbol
0    Mk1          A    ABC
1   Mk 1          A    ABC
2   Mk 1          B    123
3   Mk 2          B    123
4   Mk 3          A    XYZ

Pandas - Trying to assign values to dataframe based on multiple conditions

We need two conditions

df.loc[df['field1'].isnull() & df['field3'].isnull(), 'fieldTemp'] = 0

How to set values of a column based on multiple conditions in other columns in python?

You're missing parenthesis when defining the conditions. The reason behind this is that bitwise operators have higher precedence than comparissons. Instead use:

m1 = (df.col1 >= 1) & (df.col2 >= 1) & (df.col3 >= 1) & 
     (df.col4 >= 1) & (df.col5 >= 1)
m2 = (df.col2 >= 1) & (df.col3 >= 1) & (df.col4 >= 1) & (df.col5 >= 1)
m3 = (df.col3 >= 1) & (df.col4 >= 1) & (df.col5 >= 1)

df['category'] = np.select([m1, m2, m3], ['certain', 'possible', 'probable'], 
                           default='Other')

Which results in the expected output:

    col1  col2  col3  col4  col5  category
0     1     1     1     4     1   certain
1     0     1     1     1     1  possible
2     0     0     1     1     1  probable

Use multiple conditions on a column to assign values of new column

There's no need for itterrows here, which is bad practice and considered slow.

Method 1 `pd.cut`

df['B'] = pd.cut(df['A'], [0,1,4,10], labels=['low', 'mid', 'high'])

   A     B
0  1   low
1  1   low
2  2   mid
3  3   mid
4  5  high
5  4   mid
6  2   mid
7  5  high

Method 2 `np.select`

conditions = [
    df['A'] == 1,
    df['A'].isin([2, 3, 4])
]

choices = ['low', 'mid']

df['B'] = np.select(conditions, choices, default='high')

   A     B
0  1   low
1  1   low
2  2   mid
3  3   mid
4  5  high
5  4   mid
6  2   mid
7  5  high

Assign a dataframe column a value, based on multiple conditions

We can use cut

transform(House, newcol = cut(price, breaks = c(-Inf, 300000, 500000, Inf),
       labels = c("red", "blue", "green")))
#    price newcol
#1 287655    red
#2 456355   blue
#3 662500  green
#4 597864  green
#5 876545  green

Note that if/else is not vectorized and it expects the input to have length of 1. If we are doing in this a loop with each element having length 1, it works, but it is also inefficient as there is ifelse vectorized version of if/else

House <- transform(House, newcol = ifelse(price < 300000, "red",
              ifelse(price > 300000 & price < 500000, "blue", "green")))
House
#   price newcol
#1 287655    red
#2 456355   blue
#3 662500  green
#4 597864  green
#5 876545  green

If we look at the results, both of them got the same output, but the difference is in the number of ifelse statements which can increase when there are more number of comparisons. It would be better to use cut or findInterval instead of nested ifelse

if goes with else rather than then

House$newcol <- NA
for(i in seq_len(nrow(House))) {
    House$newcol[i] <- if(House$price[i] < 300000) {
           'red'
    } else if( House$price[i] > 300000 & House$price[i] < 500000) {
       'blue'
     } else 'green'
 }

Pandas: How to Assign Values Based on Multiple Conditions for Existing Columns

Pandas: How do I assign values based on multiple conditions for existing columns?

Assign value of existing column to new columns in pandas based on multiple conditions

change column value based on multiple conditions

Assign numeric values for multiple columns based on multiple conditions in pandas DataFrame

Pandas - Assign value to subset of dataframe, based on multiple conditions

Pandas - Trying to assign values to dataframe based on multiple conditions

How to set values of a column based on multiple conditions in other columns in python?

Use multiple conditions on a column to assign values of new column

Method 1 `pd.cut`

Method 2 `np.select`

Assign a dataframe column a value, based on multiple conditions

Related Topics

Leave a reply

Pandas: How do I assign values based on multiple conditions for existing columns?

Assign value of existing column to new columns in pandas based on multiple conditions

change column value based on multiple conditions

Assign numeric values for multiple columns based on multiple conditions in pandas DataFrame

Pandas - Assign value to subset of dataframe, based on multiple conditions

Pandas - Trying to assign values to dataframe based on multiple conditions

How to set values of a column based on multiple conditions in other columns in python?

Use multiple conditions on a column to assign values of new column

Method 1 pd.cut

Method 2 np.select

Assign a dataframe column a value, based on multiple conditions

Related Topics

Leave a reply

Method 1 `pd.cut`

Method 2 `np.select`