Create New Column Based on String

How to create a new columns of dataframe based on string containing condition

You can do it with pd.Series.str.contains with giving the list l as a OR string :

import re
import pandas as pd

df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Phrases':['I have a cool family', 'I like avocados', 'I would like to go to school', 'I enjoy Harry Potter']})

l=['cool','avocado','lord of the rings']

df['new_column']=df['Phrases'].str.contains('|'.join(l))

df['matched strings']=df['Phrases'].apply(lambda x: ','.join(re.findall('|'.join(l),x)))


df
Out[18]:
Date Phrases new_column matched strings
0 10/2/2011 I have a cool family True cool
1 11/2/2011 I like avocados True avocado
2 12/2/2011 I would like to go to school False
3 13/2/2011 I enjoy Harry Potter False

Add new column to pandas data frame based on string + value from another column in the data frame

Use:

df['axis'] = 'up to ' + df['end'].astype(str)

Creating a new column using string match and based on if-else condition

The root problem here is that your code compares a single string (row['url_text']) to a dataframe (df[df...])

Instead of referencing df inside your function, just use methods that are defined on the row itself. You can also implement this as a lambda function to be closer to the canonical examples.

df['blocked'] = df.apply(
lambda row: 1 if 'blocked you' in row['url_text'] else 0,
axis=1
)

Creating new column based on string values from another column

library(stringr)
dom$label = str_extract(dom$Banner, "Watermelon|Vanilla")
dom$label[is.na(dom$label)] <- "Default"
dom
# Site Banner label
# 1 alpha testing_Watermelon -DPI_300x250 v2 Watermelon
# 2 beta notest_Vanilla Latte-DPI_300x250 v2 Vanilla
# 3 charlie bottle :15s Default
# 4 delta aaaa vvvv cccc Build_Mobile_320x480 Default

Fill new column based on conditions defined in a string

Here a solution to convert your condition to a python function and then applying it to the rows of your DataFrame:

import re

condition_string = "colA='yes' & colB='yes' & (colC='yes' | colD='yes'): 'Yes', colA='no' & colB='no' & (colC='no' | colD='no'): 'No', ELSE : 'UNKNOWN'"

# formatting string as python function apply_cond
for col in df.columns:
condition_string = re.sub(rf"(\W|^){col}(\W|$)", rf"\1row['{col}']\2", condition_string)
condition_string = re.sub(rf"row\['{col}'\]\s*=(?!=)", f"row['{col}']==", condition_string)

cond_form = re.sub(r'(:[^[(]+), (?!ELSE)', r'\1\n\telif ', condition_string) \
.replace(": ", ":\n\t\treturn ") \
.replace("&", "and") \
.replace('|', 'or')
cond_form = re.sub(r", ELSE\s*:", "\n\telse:", cond_form)
function_def = "def apply_cond(row):\n\tif " + cond_form
#print(function_def) # uncomment to see how the function is defined

# executing the function definition of apply_cond
exec(function_def)

# applying the function to each row
df["result"]=df.apply(lambda x: apply_cond(x), axis=1)

print(df)

Output:

     ID colA colB colC colD   result
0 AB01 yes NaN yes yes UNKNOWN
1 AB02 yes yes yes no Yes
2 AB03 yes yes yes yes Yes
3 AB03 no no no no No
4 AB04 no no no NaN No
5 AB05 yes NaN NaN no UNKNOWN
6 AB06 NaN yes NaN NaN UNKNOWN

You might want to adapt string formatting depending on condition_string (I did it quickly, there might be some unsupported combinations) but if you get those strings automatically it will save you from defining them all over again.

How can I build a function to create a new column based on other columns containing a certain string?

if you wanted to create a new column with binary values (if condition met then A else B), you could do something like this

#create a column 'new' with value 'Brasil' if 'Nationality' value contains 'Bra', else put 'NA'
df['new'] = df['Nationality'].apply(lambda x: 'Brasil' if 'Bra' in x else 'NA')

otherwise, if you wanted to create a column and use multiple rules in the same column, you could do something like this...

#create a column 'new' and insert value 'ARG' whenever 'Nationality' contains 'Arg', 
df.loc[df['Nationality'].str.contains('Arg'), 'new'] = 'ARG'
#and 'BRA' whenever Nationality contains 'Brazil', without overriding any other values
df.loc[df['Nationality'].str.contains('Brazil'), 'new'] = 'BRA'

adding values in new column based on string contains in another column

Use str.extract to get the substring from the string-based column

d = {'apple': 'A001', 'ball': 'B099', 'fan': 'F009'}

df['category'] = (
df.descriptions
.str.lower()
.str.extract('(' + '|'.join(d.keys()) + ')')
.squeeze().map(d)
)


Related Topics



Leave a reply



Submit