Using If Else on a Dataframe Across Multiple Columns

pandas if else conditions on multiple columns

Use numpy.select:

df['value'] = np.select([df.a > 0 , df.b > 0], [df.a, df.b], default=df.c)
print (df)
a b c value
0 0 0 6 6
1 0 3 7 3
2 1 4 8 1
3 2 5 9 2

Difference between vectorized and loop solutions in 400k rows:

df = pd.concat([df] * 100000, ignore_index=True)

In [158]: %timeit df['value2'] = np.select([df.a > 0 , df.b > 0], [df.a, df.b], default=df.c)
9.86 ms ± 611 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [159]: %timeit df['value1'] = [x if x > 0 else y if y>0 else z for x,y,z in zip(df['a'],df['b'],df['c'])]
399 ms ± 52.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Ifelse for Multiple Columns in DataFrame

We may use if_any

library(dplyr)
df1 <- df1 %>%
mutate(calculated_column = +(if_any(-ID, ~ . %in% 'high')))

-output

df1
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1

Or if we want to use base R, create the logical condition with rowSums on a logical matrix

df1$calculated_column <-  +(rowSums(df1[-1] == "high", na.rm = TRUE) > 0)

data

df1 <- structure(list(ID = 1:4, Winter = c("high", "low", "low", "low"
), Spring = c(NA, "high", NA, "high"), Summer = c("high", NA,
NA, NA), Fall = c("low", "low", "low", "low")),
class = "data.frame", row.names = c(NA,
-4L))

How can I do an if statement on a pandas dataframe to check multiple columns for specific values?

You want to filter rows where the last name is "Smith" AND the first name is either "John" OR "Tom". This means it's either "John Smith" OR "Tom Smith". This is equivalent to

(last_name=="Smith" AND first_name=="John") OR (last_name=="Smith" AND first_name=="Tom")

which is equivalent to:

(last_name=="smith") AND (first_name=='john' OR first_name=='tom')

the latter OR can be handled using isin:

out = df[(df['last_name']=='smith') & (df['first_name'].isin(['john','tom']))]

Output:

  last_name first_name match
0 smith john yes
1 smith tom yes

pandas if else conditions for multiple columns using dataframe

import pandas as pd

df = pd.DataFrame({'text1': [['bread', 'bread', 'bread'],
['bread', 'butter', 'jam'],
['bread', 'jam', 'jam'],
['unknown']]})

List cells aren't good, so let's explode them:

df = df.explode('text1')

>>> df.head()
text1
0 bread
0 bread
0 bread
1 bread
1 butter

Now you can use groupby to apply a function to each document (by grouping by index level 0).

The details of the heuristic are up to you, but here's something to start with:

def get_values(s):
counts = s.value_counts()

if "unknown" in counts:
return "unknown"

if counts.eq(1).all():
return s.iloc[1]

if counts.max() >= 2:
return counts.idxmax()

Apply to each group:

>>> df.groupby(level=0).text1.apply(get_values)
0 bread
1 butter
2 jam
3 unknown
Name: text1, dtype: object

multiple if else conditions in pandas dataframe and derive multiple columns

You need chained comparison using upper and lower bound

def flag_df(df):

if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return 'Red'
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return 'Yellow'
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return 'Orange'
elif (df['height'] > 8):
return np.nan

df2['Flag'] = df2.apply(flag_df, axis = 1)

student score height trigger1 trigger2 trigger3 Flag
0 A 100 7 84 99 114 Yellow
1 B 96 4 95 110 125 Red
2 C 80 9 15 30 45 NaN
3 D 105 5 78 93 108 Yellow
4 E 156 3 16 31 46 Orange

Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else

Edit: answering @Cecilia's questions

  1. what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2

Not sure what you tried but you can return a derived value instead of string using

def flag_df(df):

if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan

  1. what if there are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working

Again not sure what code did you try but using pandas isnull would do the trick

def flag_df(df):

if pd.isnull(df['height']):
return df['height']
elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan

If statement across multiple columns in Pandas

You can try this for your above dataframe:

df['size'] = (df.iloc[:, 4:] >= .6).dot(df.columns[4:]).str.split('_').str[0]
df['condition'] = np.where(df['size']!='', 'YES', 'NO')

Output:

   total  big  med  small  big_perc  med_perc  sml_perc size condition
1 5.0 4.0 0.0 1.0 0.8 0.0 0.2 big YES
2 6.0 0.0 3.0 3.0 0.0 0.5 0.5 NO
3 5.0 2.0 3.0 0.0 0.4 0.6 0.0 med YES

Slice you dataframe to only select the columns with the percentages, then create a boolean matrix for greater or equal to .6, then use dot to capture the column name for those True values. Use string manipulation to get big, medium or small.

Pandas DataFrames If else condition on multiple columns containing dash

Without numpy, you can do this:

df.loc[(df['first-name'] == 'john') & (df['height-ft'] == 6), 'shape-type'] = 'good'
df.loc[(df['height-ft'] == 4), 'shape-type'] = 'bad'
df.loc[((df['first-name'] != 'john') & (df['height-ft'] != 4)), 'shape-type'] = 'middle'
print(df)

first-name height-ft shape-type
0 john 6 good
1 peter 5 middle
2 john 4 bad
3 alex 6 middle

With np.where:

df['shape-type'] = np.where((df['first-name']=='john') & (df['height-ft']==6), 'good', 'middle')
df['shape-type'] = np.where((df['height-ft']==4), 'bad', df['shape-type'])

first-name height-ft shape-type
0 john 6 good
1 peter 5 middle
2 john 4 bad
3 alex 6 middle

Using if else on a dataframe across multiple columns

For your example dataset this will work;

Option 1, name the columns to change:

dat[which(dat$desc == "blank"), c("x", "y", "z")] <- NA

In your actual data with 40 columns, if you just want to set the last 39 columns to NA, then the following may be simpler than naming each of the columns to change;

Option 2, select columns using a range:

dat[which(dat$desc == "blank"), 2:40] <- NA

Option 3, exclude the 1st column:

dat[which(dat$desc == "blank"), -1] <- NA

Option 4, exclude a named column:

dat[which(dat$desc == "blank"), !names(dat) %in% "desc"] <- NA

As you can see, there are many ways to do this kind of operation (this is far from a complete list), and understanding how each of these options works will help you to get a better understanding of the language.



Related Topics



Leave a reply



Submit