Conditionally fill column values based on another columns value in pandas
You probably want to do
df['Normalized'] = np.where(df['Currency'] == '$', df['Budget'] * 0.78125, df['Budget'])
how to create and fill a new column based on conditions in two other columns?
Try this:
df["E"] = np.nan
# Use boolean indexing to set no-yes to placeholder value
df.loc[(df["A"] == "no") & (df["B"] == "yes"), "E"] = "PL"
# Shift placeholder down by one, as it seems from your example
# that you want X to be on the no-yes "stopping" row
df["E"] = df.E.shift(1)
# Then set the X value on the yes-no rows
df.loc[(df.A == "yes") & (df.B == "no"), "E"] = "X"
df["E"] = df.E.ffill() # Fill forward
# Fix placeholders
df.loc[df.E == "PL", "E"] = np.nan
Results:
A B C D E
0 no no nan nan NaN
1 no no nan nan NaN
2 yes no X X X
3 yes no nan X X
4 no no nan X X
5 no yes nan X X
6 no yes nan nan NaN
7 yes no X X X
8 no no nan X X
9 yes no X X X
10 yes no nan X X
11 no no nan X X
12 no yes nan X X
13 no no nan nan NaN
Fill new column in dataframe using conditional logic
You can use numpy.where
function, to get the required values; use .isin
method to check if the value of column Game
is one of [Type A, Type B, Type C]
, assign Played
for True
values, and assign Status
column values for False
values:
>>> np.where(df['Game'].isin(['Type A', 'Type B', 'Type C']), ['Played'], df['Status'])
array(['Played', 'Played', 'Played', 'Played', 'Won', nan], dtype=object)
You can assign it as a new column:
df['Result'] = np.where(df['Game'].isin(['Type A', 'Type B', 'Type C']),
['Played'],
df['Status'])
ID Game Status Result
0 AB01 Type A Won Played
1 AB02 Type B Draw Played
2 AB03 Type A Won Played
3 AB04 Type C NaN Played
4 AB05 Type D Won Won
5 AB06 Type D NaN NaN
How to fill the value into a new column based on the condition of another column in Pandas dataframe?
Just try using np.where
:
df['Desired col C'] = np.where(df['A'].lt(2), 'C', 'Null')
And now:
print(df)
Gives:
A B C Desired col C
0 1 Null Null C
1 2 AB Null Null
2 3 AB Null Null
3 4 AB Null Null
4 5 AB Null Null
5 6 AB Null Null
6 7 AB Null Null
7 8 AB Null Null
8 9 AB Null Null
9 1 Null Null C
10 0 Null Null C
11 1 Null Null C
12 1 Null Null C
13 1 Null Null C
Pandas df: fill values in new column with specific values from another column (condition with multiple columns)
You can left join the dataframe to itself using col1 on the left side & col2 on the right side.
rename col3
from the right side of the join to col4
and drop the rest of the right side columns
example:
df = df.merge(df, left_on='col1', right_on='col2', how='left', suffixes=('', '_'))
df = df.rename(columns={'col3_': 'col4'})
df = df[['col1', 'col2', 'col3', 'col4']]
df looks like:
col1 col2 col3 col4
0 a b 1 NaN
1 b c 2 1.0
2 c d 3 2.0
3 d e 4 3.0
How to fill column based on condition taking other columns into account?
There is no need for a for-loop as you can use vectorized solutions in this case. Three options on how to solve this problem:
# option 1
test_df$vec3 <- +(test_df$vec1 <= 25 | test_df$vec1 >= 75)
# option 2
test_df$vec3 <- as.integer(test_df$vec1 <= 25 | test_df$vec1 >= 75)
# option 3
test_df$vec3 <- ifelse(test_df$vec1 <= 25 | test_df$vec1 >= 75, 1, 0)
which in all cases gives:
vec1 vec2 vec3
1 5 1 1
2 6 2 1
3 61 3 0
4 20 4 1
....
47 3 47 1
48 55 48 0
49 44 49 0
50 97 50 1
(only first and last four rows presentend)
How to create a new column in pandas dataframe based on a condition?
You could try np.where:
import numpy as np
df['result'] = np.where(df['zip_code'].astype(str).str.len() == 5, True, df['find_no'])
Only downside with this approach is that NumPy will convert your True values to 1's, which could be confusing. An approach to keep the values you want is to do
import numpy as np
df['result'] = np.where(df['zip_code'].astype(str).str.len() == 5, 'True', df['find_no'].astype(str))
The downside here being that you lose the meaning of those values by casting them to strings. I guess it all depends on what you're hoping to accomplish.
Fill new column based on conditions defined in a string
Here a solution to convert your condition to a python function and then applying it to the rows of your DataFrame:
import re
condition_string = "colA='yes' & colB='yes' & (colC='yes' | colD='yes'): 'Yes', colA='no' & colB='no' & (colC='no' | colD='no'): 'No', ELSE : 'UNKNOWN'"
# formatting string as python function apply_cond
for col in df.columns:
condition_string = re.sub(rf"(\W|^){col}(\W|$)", rf"\1row['{col}']\2", condition_string)
condition_string = re.sub(rf"row\['{col}'\]\s*=(?!=)", f"row['{col}']==", condition_string)
cond_form = re.sub(r'(:[^[(]+), (?!ELSE)', r'\1\n\telif ', condition_string) \
.replace(": ", ":\n\t\treturn ") \
.replace("&", "and") \
.replace('|', 'or')
cond_form = re.sub(r", ELSE\s*:", "\n\telse:", cond_form)
function_def = "def apply_cond(row):\n\tif " + cond_form
#print(function_def) # uncomment to see how the function is defined
# executing the function definition of apply_cond
exec(function_def)
# applying the function to each row
df["result"]=df.apply(lambda x: apply_cond(x), axis=1)
print(df)
Output:
ID colA colB colC colD result
0 AB01 yes NaN yes yes UNKNOWN
1 AB02 yes yes yes no Yes
2 AB03 yes yes yes yes Yes
3 AB03 no no no no No
4 AB04 no no no NaN No
5 AB05 yes NaN NaN no UNKNOWN
6 AB06 NaN yes NaN NaN UNKNOWN
You might want to adapt string formatting depending on condition_string
(I did it quickly, there might be some unsupported combinations) but if you get those strings automatically it will save you from defining them all over again.
Related Topics
Blend of Na.Omit and Na.Pass Using Aggregate
How to Sum Data.Frame Column Values
Arranging Rows in Custom Order Using Dplyr
Ggplot2 PDF Import in Adobe Illustrator Missing Font Adobepistd
Linking Intel's Math Kernel Library (Mkl) to R on Windows
Significance Level Added to Matrix Correlation Heatmap Using Ggplot2
R Define Dimensions of Empty Data Frame
What Is Your Preferred Style for Naming Variables in R
Hyperlink Bar Chart in Highcharter
Fixing a Multiple Warning "Unknown Column"
Topic Models: Cross Validation with Loglikelihood or Perplexity
Is There a Difference Between the R Functions Fitted() and Predict()
How to Get the Second Sub Element of Every Element in a List
Regression with Heteroskedasticity Corrected Standard Errors