Create and Assign Multiple New Dataframe Columns in Ifelse Statement

multiple if else conditions in pandas dataframe and derive multiple columns

You need chained comparison using upper and lower bound

def flag_df(df):

if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return 'Red'
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return 'Yellow'
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return 'Orange'
elif (df['height'] > 8):
return np.nan

df2['Flag'] = df2.apply(flag_df, axis = 1)

student score height trigger1 trigger2 trigger3 Flag
0 A 100 7 84 99 114 Yellow
1 B 96 4 95 110 125 Red
2 C 80 9 15 30 45 NaN
3 D 105 5 78 93 108 Yellow
4 E 156 3 16 31 46 Orange

Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else

Edit: answering @Cecilia's questions

  1. what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2

Not sure what you tried but you can return a derived value instead of string using

def flag_df(df):

if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan

  1. what if there are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working

Again not sure what code did you try but using pandas isnull would do the trick

def flag_df(df):

if pd.isnull(df['height']):
return df['height']
elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan

Pandas: How do I assign values based on multiple conditions for existing columns?

You can do this using np.where, the conditions use bitwise & and | for and and or with parentheses around the multiple conditions due to operator precedence. So where the condition is true 5 is returned and 0 otherwise:

In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df

Out[29]:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0

How to add multiple columns to pandas dataframe in one assignment?

I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...), pandas requires that the right hand side be a DataFrame (note that it doesn't actually matter if the columns of the DataFrame have the same names as the columns you are creating).

Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.

Here are several approaches that will work:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]
})

Then one of the following:

1) Three assignments in one, using list unpacking:

df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]

2) DataFrame conveniently expands a single row to match the index, so you can do this:

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

3) Make a temporary data frame with new columns, then combine with the original data frame later:

df = pd.concat(
[
df,
pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
)
], axis=1
)

4) Similar to the previous, but using join instead of concat (may be less efficient):

df = df.join(pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
))

5) Using a dict is a more "natural" way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):

df = df.join(pd.DataFrame(
{
'column_new_1': np.nan,
'column_new_2': 'dogs',
'column_new_3': 3
}, index=df.index
))

6) Use .assign() with multiple column arguments.

I like this variant on @zero's answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:

df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)

7) This is interesting (based on https://stackoverflow.com/a/44951376/3830997), but I don't know when it would be worth the trouble:

new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols) # add empty cols
df[new_cols] = new_vals # multi-column assignment works for existing cols

8) In the end it's hard to beat three separate assignments:

df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3

Note: many of these options have already been covered in other answers: Add multiple columns to DataFrame and set them equal to an existing column, Is it possible to add several columns at once to a pandas DataFrame?, Add multiple empty columns to pandas DataFrame

Using If else to check multiple columns and create a new column based on the response for string responses

This is not the more efficient answer, neither the more general solution, but may satisfy a solution:

#create columns
st <- rep(NA,nrow(hairdf));
cur <- rep(NA,nrow(hairdf));
wav <- rep(NA,nrow(hairdf));
mix <- rep(NA,nrow(hairdf));

#join and define words
hairdf <- cbind(hairdf,st,cur,wav,mix);
words <- c("straight","curly","wavy","mixed");
words_ast <- paste(words,"*",sep=""); #just get the "*" words

#make a loop according to positions of columns st,cur,wav,mix
for (j in 1:length(words_ast)){ #let's see if we can evaluate 2 in words_ast
for (i in c(2,3,4)){ #but only in columns we selected
a <- subset(hairdf,hairdf[,i]==words_ast[j]) #subset columns which satisfay condition. [Note that this can be written as hairdf %>% subset(.[,i]==words_ast[j]) ]
hairdf[row.names(a),7+j] <- 2 #replace value from column 8
}
}
#repeat process for "words"

for (j in 1:length(words)){
for (i in c(2,3,4)){
a <- subset(hairdf,hairdf[,i]==words[j])
hairdf[row.names(a),7+j] <- 1
}
}

This should allow you to get the expected result. Alternatively, you can use the assign() function, i.e

assign(x,value=1)

where x is each element in words.

So in a loop:

assign(words[n],value=1) ; assign(words_ast[n],value=2)

pandas if else conditions for multiple columns using dataframe

import pandas as pd

df = pd.DataFrame({'text1': [['bread', 'bread', 'bread'],
['bread', 'butter', 'jam'],
['bread', 'jam', 'jam'],
['unknown']]})

List cells aren't good, so let's explode them:

df = df.explode('text1')

>>> df.head()
text1
0 bread
0 bread
0 bread
1 bread
1 butter

Now you can use groupby to apply a function to each document (by grouping by index level 0).

The details of the heuristic are up to you, but here's something to start with:

def get_values(s):
counts = s.value_counts()

if "unknown" in counts:
return "unknown"

if counts.eq(1).all():
return s.iloc[1]

if counts.max() >= 2:
return counts.idxmax()

Apply to each group:

>>> df.groupby(level=0).text1.apply(get_values)
0 bread
1 butter
2 jam
3 unknown
Name: text1, dtype: object

Apply pandas function to column to create multiple new columns?

Building off of user1827356 's answer, you can do the assignment in one pass using df.merge:

df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})), 
left_index=True, right_index=True)

textcol feature1 feature2
0 0.772692 1.772692 -0.227308
1 0.857210 1.857210 -0.142790
2 0.065639 1.065639 -0.934361
3 0.819160 1.819160 -0.180840
4 0.088212 1.088212 -0.911788

EDIT:
Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !



Related Topics



Leave a reply



Submit