Using If Else on a Dataframe Across Multiple Columns

pandas if else conditions on multiple columns

Use numpy.select:

df['value'] = np.select([df.a > 0 , df.b > 0], [df.a, df.b], default=df.c)
print (df)
   a  b  c  value
0  0  0  6      6
1  0  3  7      3
2  1  4  8      1
3  2  5  9      2

Difference between vectorized and loop solutions in 400k rows:

df = pd.concat([df] * 100000, ignore_index=True)

In [158]: %timeit df['value2'] = np.select([df.a > 0 , df.b > 0], [df.a, df.b], default=df.c)
9.86 ms ± 611 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [159]: %timeit df['value1'] = [x if x > 0 else y if y>0 else z for x,y,z in zip(df['a'],df['b'],df['c'])]
399 ms ± 52.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Ifelse for Multiple Columns in DataFrame

We may use if_any

library(dplyr)
df1 <- df1 %>%
     mutate(calculated_column = +(if_any(-ID, ~ . %in% 'high')))

-output

df1
 ID Winter Spring Summer Fall calculated_column
1  1   high   <NA>   high  low                 1
2  2    low   high   <NA>  low                 1
3  3    low   <NA>   <NA>  low                 0
4  4    low   high   <NA>  low                 1

Or if we want to use base R, create the logical condition with rowSums on a logical matrix

df1$calculated_column <-  +(rowSums(df1[-1] == "high", na.rm = TRUE) > 0)

data

df1 <- structure(list(ID = 1:4, Winter = c("high", "low", "low", "low"
), Spring = c(NA, "high", NA, "high"), Summer = c("high", NA, 
NA, NA), Fall = c("low", "low", "low", "low")), 
class = "data.frame", row.names = c(NA, 
-4L))

How can I do an if statement on a pandas dataframe to check multiple columns for specific values?

You want to filter rows where the last name is "Smith" AND the first name is either "John" OR "Tom". This means it's either "John Smith" OR "Tom Smith". This is equivalent to

(last_name=="Smith" AND first_name=="John") OR (last_name=="Smith" AND first_name=="Tom")

which is equivalent to:

(last_name=="smith") AND (first_name=='john' OR first_name=='tom')

the latter OR can be handled using isin:

out = df[(df['last_name']=='smith') & (df['first_name'].isin(['john','tom']))]

Output:

  last_name first_name match
0     smith       john   yes
1     smith        tom   yes

pandas if else conditions for multiple columns using dataframe

import pandas as pd

df = pd.DataFrame({'text1': [['bread', 'bread', 'bread'],
                             ['bread', 'butter', 'jam'],
                             ['bread', 'jam', 'jam'],
                             ['unknown']]})

List cells aren't good, so let's explode them:

df = df.explode('text1')

>>> df.head()
     text1
0    bread
0    bread
0    bread
1    bread
1   butter

Now you can use groupby to apply a function to each document (by grouping by index level 0).

The details of the heuristic are up to you, but here's something to start with:

def get_values(s):
    counts = s.value_counts()
     
    if "unknown" in counts:
        return "unknown"
    
    if counts.eq(1).all():
        return s.iloc[1]

    if counts.max() >= 2:
        return counts.idxmax()

Apply to each group:

>>> df.groupby(level=0).text1.apply(get_values)
0      bread
1     butter
2        jam
3    unknown
Name: text1, dtype: object

multiple if else conditions in pandas dataframe and derive multiple columns

You need chained comparison using upper and lower bound

def flag_df(df):
    
    if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return 'Red'
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return 'Yellow'
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return 'Orange'
    elif (df['height'] > 8):
        return np.nan
    
df2['Flag'] = df2.apply(flag_df, axis = 1)

    student score   height  trigger1    trigger2    trigger3    Flag
0   A       100     7       84          99          114         Yellow
1   B       96      4       95          110         125         Red
2   C       80      9       15          30          45          NaN
3   D       105     5       78          93          108         Yellow
4   E       156     3       16          31          46          Orange

Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else

Edit: answering @Cecilia's questions

what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2

Not sure what you tried but you can return a derived value instead of string using

def flag_df(df):

    if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return df['height']*2
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return df['height']*3
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return df['height']*4
    elif (df['height'] > 8):
        return np.nan

what if there are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working

Again not sure what code did you try but using pandas isnull would do the trick

def flag_df(df):

    if pd.isnull(df['height']):
        return df['height']
    elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return df['height']*2
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return df['height']*3
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return df['height']*4
    elif (df['height'] > 8):
        return np.nan

If statement across multiple columns in Pandas

You can try this for your above dataframe:

df['size'] = (df.iloc[:, 4:] >= .6).dot(df.columns[4:]).str.split('_').str[0]
df['condition'] = np.where(df['size']!='', 'YES', 'NO')

Output:

   total  big  med  small  big_perc  med_perc  sml_perc size condition
1    5.0  4.0  0.0    1.0       0.8       0.0       0.2  big       YES
2    6.0  0.0  3.0    3.0       0.0       0.5       0.5             NO
3    5.0  2.0  3.0    0.0       0.4       0.6       0.0  med       YES

Slice you dataframe to only select the columns with the percentages, then create a boolean matrix for greater or equal to .6, then use dot to capture the column name for those True values. Use string manipulation to get big, medium or small.

Pandas DataFrames If else condition on multiple columns containing dash

Without numpy, you can do this:

df.loc[(df['first-name'] == 'john') & (df['height-ft'] == 6), 'shape-type'] = 'good'
df.loc[(df['height-ft'] == 4), 'shape-type'] = 'bad'
df.loc[((df['first-name'] != 'john') & (df['height-ft'] != 4)), 'shape-type'] = 'middle'
print(df)

  first-name  height-ft shape-type
0       john          6       good
1      peter          5     middle
2       john          4        bad
3       alex          6     middle

With np.where:

df['shape-type'] = np.where((df['first-name']=='john') & (df['height-ft']==6), 'good', 'middle')
df['shape-type'] = np.where((df['height-ft']==4), 'bad', df['shape-type'])

  first-name  height-ft shape-type
0       john          6       good
1      peter          5     middle
2       john          4        bad
3       alex          6     middle