Filling in a New Column Based on a Condition in a Data Frame

Conditionally fill column values based on another columns value in pandas

You probably want to do

df['Normalized'] = np.where(df['Currency'] == '$', df['Budget'] * 0.78125, df['Budget'])

how to create and fill a new column based on conditions in two other columns?

Try this:

df["E"] = np.nan

# Use boolean indexing to set no-yes to placeholder value
df.loc[(df["A"] == "no") & (df["B"] == "yes"), "E"] = "PL"

# Shift placeholder down by one, as it seems from your example
# that you want X to be on the no-yes "stopping" row
df["E"] = df.E.shift(1)

# Then set the X value on the yes-no rows
df.loc[(df.A == "yes") & (df.B == "no"), "E"] = "X"
df["E"] = df.E.ffill() # Fill forward

# Fix placeholders
df.loc[df.E == "PL", "E"] = np.nan

Results:

    A   B   C   D   E
0 no no nan nan NaN
1 no no nan nan NaN
2 yes no X X X
3 yes no nan X X
4 no no nan X X
5 no yes nan X X
6 no yes nan nan NaN
7 yes no X X X
8 no no nan X X
9 yes no X X X
10 yes no nan X X
11 no no nan X X
12 no yes nan X X
13 no no nan nan NaN

Fill new column in dataframe using conditional logic

You can use numpy.where function, to get the required values; use .isin method to check if the value of column Game is one of [Type A, Type B, Type C], assign Played for True values, and assign Status column values for False values:

>>> np.where(df['Game'].isin(['Type A', 'Type B', 'Type C']), ['Played'], df['Status'])
array(['Played', 'Played', 'Played', 'Played', 'Won', nan], dtype=object)

You can assign it as a new column:

df['Result'] = np.where(df['Game'].isin(['Type A', 'Type B', 'Type C']),
['Played'],
df['Status'])

ID Game Status Result
0 AB01 Type A Won Played
1 AB02 Type B Draw Played
2 AB03 Type A Won Played
3 AB04 Type C NaN Played
4 AB05 Type D Won Won
5 AB06 Type D NaN NaN

How to fill the value into a new column based on the condition of another column in Pandas dataframe?

Just try using np.where:

df['Desired col C'] = np.where(df['A'].lt(2), 'C', 'Null')

And now:

print(df)

Gives:

    A     B     C Desired col C
0 1 Null Null C
1 2 AB Null Null
2 3 AB Null Null
3 4 AB Null Null
4 5 AB Null Null
5 6 AB Null Null
6 7 AB Null Null
7 8 AB Null Null
8 9 AB Null Null
9 1 Null Null C
10 0 Null Null C
11 1 Null Null C
12 1 Null Null C
13 1 Null Null C

Pandas df: fill values in new column with specific values from another column (condition with multiple columns)

You can left join the dataframe to itself using col1 on the left side & col2 on the right side.

rename col3 from the right side of the join to col4 and drop the rest of the right side columns
example:

df = df.merge(df, left_on='col1', right_on='col2', how='left', suffixes=('', '_'))
df = df.rename(columns={'col3_': 'col4'})
df = df[['col1', 'col2', 'col3', 'col4']]

df looks like:

  col1 col2  col3  col4
0 a b 1 NaN
1 b c 2 1.0
2 c d 3 2.0
3 d e 4 3.0

How to fill column based on condition taking other columns into account?

There is no need for a for-loop as you can use vectorized solutions in this case. Three options on how to solve this problem:

# option 1
test_df$vec3 <- +(test_df$vec1 <= 25 | test_df$vec1 >= 75)

# option 2
test_df$vec3 <- as.integer(test_df$vec1 <= 25 | test_df$vec1 >= 75)

# option 3
test_df$vec3 <- ifelse(test_df$vec1 <= 25 | test_df$vec1 >= 75, 1, 0)

which in all cases gives:

   vec1 vec2 vec3
1 5 1 1
2 6 2 1
3 61 3 0
4 20 4 1

....

47 3 47 1
48 55 48 0
49 44 49 0
50 97 50 1

(only first and last four rows presentend)

How to create a new column in pandas dataframe based on a condition?

You could try np.where:

import numpy as np

df['result'] = np.where(df['zip_code'].astype(str).str.len() == 5, True, df['find_no'])

Only downside with this approach is that NumPy will convert your True values to 1's, which could be confusing. An approach to keep the values you want is to do

import numpy as np

df['result'] = np.where(df['zip_code'].astype(str).str.len() == 5, 'True', df['find_no'].astype(str))

The downside here being that you lose the meaning of those values by casting them to strings. I guess it all depends on what you're hoping to accomplish.

Fill new column based on conditions defined in a string

Here a solution to convert your condition to a python function and then applying it to the rows of your DataFrame:

import re

condition_string = "colA='yes' & colB='yes' & (colC='yes' | colD='yes'): 'Yes', colA='no' & colB='no' & (colC='no' | colD='no'): 'No', ELSE : 'UNKNOWN'"

# formatting string as python function apply_cond
for col in df.columns:
condition_string = re.sub(rf"(\W|^){col}(\W|$)", rf"\1row['{col}']\2", condition_string)
condition_string = re.sub(rf"row\['{col}'\]\s*=(?!=)", f"row['{col}']==", condition_string)

cond_form = re.sub(r'(:[^[(]+), (?!ELSE)', r'\1\n\telif ', condition_string) \
.replace(": ", ":\n\t\treturn ") \
.replace("&", "and") \
.replace('|', 'or')
cond_form = re.sub(r", ELSE\s*:", "\n\telse:", cond_form)
function_def = "def apply_cond(row):\n\tif " + cond_form
#print(function_def) # uncomment to see how the function is defined

# executing the function definition of apply_cond
exec(function_def)

# applying the function to each row
df["result"]=df.apply(lambda x: apply_cond(x), axis=1)

print(df)

Output:

     ID colA colB colC colD   result
0 AB01 yes NaN yes yes UNKNOWN
1 AB02 yes yes yes no Yes
2 AB03 yes yes yes yes Yes
3 AB03 no no no no No
4 AB04 no no no NaN No
5 AB05 yes NaN NaN no UNKNOWN
6 AB06 NaN yes NaN NaN UNKNOWN

You might want to adapt string formatting depending on condition_string (I did it quickly, there might be some unsupported combinations) but if you get those strings automatically it will save you from defining them all over again.



Related Topics



Leave a reply



Submit