pandas create new column based on values from other columns / apply a function of multiple columns, row-wise
OK, two steps to this - first is to write a function that does the translation you want - I've put an example together based on your pseudo-code:
def label_race (row):
if row['eri_hispanic'] == 1 :
return 'Hispanic'
if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
return 'Two Or More'
if row['eri_nat_amer'] == 1 :
return 'A/I AK Native'
if row['eri_asian'] == 1:
return 'Asian'
if row['eri_afr_amer'] == 1:
return 'Black/AA'
if row['eri_hawaiian'] == 1:
return 'Haw/Pac Isl.'
if row['eri_white'] == 1:
return 'White'
return 'Other'
You may want to go over this, but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled "row".
Next, use the apply function in pandas to apply the function - e.g.
df.apply (lambda row: label_race(row), axis=1)
Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:
0 White
1 Hispanic
2 White
3 White
4 Other
5 White
6 Two Or More
7 White
8 Haw/Pac Isl.
9 White
If you're happy with those results, then run it again, saving the results into a new column in your original dataframe.
df['race_label'] = df.apply (lambda row: label_race(row), axis=1)
The resultant dataframe looks like this (scroll to the right to see the new column):
lname fname rno_cd eri_afr_amer eri_asian eri_hawaiian eri_hispanic eri_nat_amer eri_white rno_defined race_label
0 MOST JEFF E 0 0 0 0 0 1 White White
1 CRUISE TOM E 0 0 0 1 0 0 White Hispanic
2 DEPP JOHNNY NaN 0 0 0 0 0 1 Unknown White
3 DICAP LEO NaN 0 0 0 0 0 1 Unknown White
4 BRANDO MARLON E 0 0 0 0 0 0 White Other
5 HANKS TOM NaN 0 0 0 0 0 1 Unknown White
6 DENIRO ROBERT E 0 1 0 0 0 1 White Two Or More
7 PACINO AL E 0 0 0 0 0 1 White White
8 WILLIAMS ROBIN E 0 0 1 0 0 0 White Haw/Pac Isl.
9 EASTWOOD CLINT E 0 0 0 0 0 1 White White
A new column in pandas which value depends on other columns
To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.
def calc_new_col(row):
if row['col2'] <= 50 & row['col3'] <= 50:
return row['col1']
else:
return max(row['col1'], row['col2'], row['col3'])
df["state"] = df.apply(calc_new_col, axis=1)
# axis=1 makes sure that function is applied to each row
print(df)
datetime col1 col2 col3 state
2021-04-10 01:00:00 25 50 50 25
2021-04-10 02:00:00 25 50 50 25
2021-04-10 03:00:00 25 100 50 100
2021-04-10 04:00:00 50 50 100 100
2021-04-10 05:00:00 100 100 100 100
apply
helps the code to be cleaner and more reusable.
Create new column based on values from three other columns in R
Additional to the solution by @r2evans in the comment section:
We could use coalesce
from dplyr
package:
df %>%
mutate(d = coalesce(a, b, c))
a b c d
1 1 NA NA 1
2 NA NA 5 5
3 3 NA NA 3
4 NA 4 NA 4
5 NA 50 NA 50
OR
We could use unite
from tidyr
package with na.rm
argument:
library(tidyr)
library(dplyr)
df %>%
unite(d, a:c, na.rm = TRUE, remove = FALSE)
d a b c
1 1 1 NA NA
2 5 NA NA 5
3 3 3 NA NA
4 4 NA 4 NA
5 50 NA 50 NA
Creating a new column based on conditions for other columns
Use DataFrame.isna
for test all columns if missing and then DataFrame.all
for test if all Trues per rows:
#If necessary
import numpy as np
df = df.replace(['Nan', 'NaN'], np.nan)
df['col4'] = np.where(df[['col1','col2','col3']].isna().all(1), 'original', 'referenced')
Your solution with Series.isna
:
df['col4'] = np.where(df['col1'].isna() & df['col2'].isna() & df['col3'].isna(),
'original', 'referenced')
Create new column based on other columns values with conditions
You can do it with pandas.DataFrame.apply
:
def get_prc(x):
individual_rate = x["individual_rate"]
if individual_rate >= 4:
return x["review_contents"] + " " + str(individual_rate)
return "Not positive"
df["positive_review_contents"] = df[["individual_rate", "review_contents"]].apply(get_prc, axis = 1)
The code above applies the function get_prc
row-wise.
Related Topics
Python/Pandas: Convert Month Int to Month Name
How to Get a Value from a Cell of a Dataframe
Python: How to Add Single Quotes to a Long List
Python Does Not Match Format '%Y-%M-%Dt%H:%M:%S%Z.%F'
How to Extract Integer or Float from String
In Dictionary, Converting the Value from String to Integer
Sqlalchemy: How to Filter Date Field
Format/Suppress Scientific Notation from Pandas Aggregation Results
Numpy: Checking If a Value Is Nat
Is There a Proper Variable to Track How Many Times a Loop Has Looped
Install Utils Package in Python Facing With Error Package Not Found
Filtering Dataframe Using the Length of a Column
Pandas Fill in Missing Date Within Each Group With Information in the Previous Row
Sqlalchemy, Prevent Duplicate Rows
How to Hide Chrome Driver in Python
How to Calculate R-Squared Using Python and Numpy
How to Upgrade the Sqlite Version Used by Python'S Sqlite3 Module on Mac