pandas create new column based on values from other columns / apply a function of multiple columns, row-wise
OK, two steps to this - first is to write a function that does the translation you want - I've put an example together based on your pseudo-code:
def label_race (row):
if row['eri_hispanic'] == 1 :
return 'Hispanic'
if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
return 'Two Or More'
if row['eri_nat_amer'] == 1 :
return 'A/I AK Native'
if row['eri_asian'] == 1:
return 'Asian'
if row['eri_afr_amer'] == 1:
return 'Black/AA'
if row['eri_hawaiian'] == 1:
return 'Haw/Pac Isl.'
if row['eri_white'] == 1:
return 'White'
return 'Other'
You may want to go over this, but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled "row".
Next, use the apply function in pandas to apply the function - e.g.
df.apply (lambda row: label_race(row), axis=1)
Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:
0 White
1 Hispanic
2 White
3 White
4 Other
5 White
6 Two Or More
7 White
8 Haw/Pac Isl.
9 White
If you're happy with those results, then run it again, saving the results into a new column in your original dataframe.
df['race_label'] = df.apply (lambda row: label_race(row), axis=1)
The resultant dataframe looks like this (scroll to the right to see the new column):
lname fname rno_cd eri_afr_amer eri_asian eri_hawaiian eri_hispanic eri_nat_amer eri_white rno_defined race_label
0 MOST JEFF E 0 0 0 0 0 1 White White
1 CRUISE TOM E 0 0 0 1 0 0 White Hispanic
2 DEPP JOHNNY NaN 0 0 0 0 0 1 Unknown White
3 DICAP LEO NaN 0 0 0 0 0 1 Unknown White
4 BRANDO MARLON E 0 0 0 0 0 0 White Other
5 HANKS TOM NaN 0 0 0 0 0 1 Unknown White
6 DENIRO ROBERT E 0 1 0 0 0 1 White Two Or More
7 PACINO AL E 0 0 0 0 0 1 White White
8 WILLIAMS ROBIN E 0 0 1 0 0 0 White Haw/Pac Isl.
9 EASTWOOD CLINT E 0 0 0 0 0 1 White White
Creating a new column based on other columns from another dataframe
This will do what your question asks:
df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]
Full test code:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=[
x.strip() for x in 'Name Apples Pears Grapes Peachs'.split()], data =[
['James', 3, 5, 5, 2],
['Harry', 1, 0, 2, 9],
['Will', 20, 2, 7, 3]])
print(df)
df2 = pd.DataFrame(columns=[
x.strip() for x in 'Class User Factor'.split()], data =[
['A', 'Harry', 3],
['A', 'Will', 2],
['A', 'James', 5],
['B', np.nan, 4]])
print(df2)
df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]
print(df2)
Input:
Name Apples Pears Grapes Peachs
0 James 3 5 5 2
1 Harry 1 0 2 9
2 Will 20 2 7 3
Class User Factor
0 A Harry 3
1 A Will 2
2 A James 5
3 B NaN 4
Output
Class User Factor Total
0 A Harry 3 [3, 0, 6, 27]
1 A Will 2 [40, 4, 14, 6]
2 A James 5 [15, 25, 25, 10]
Creating a new column based on conditions for other columns
Use DataFrame.isna
for test all columns if missing and then DataFrame.all
for test if all Trues per rows:
#If necessary
import numpy as np
df = df.replace(['Nan', 'NaN'], np.nan)
df['col4'] = np.where(df[['col1','col2','col3']].isna().all(1), 'original', 'referenced')
Your solution with Series.isna
:
df['col4'] = np.where(df['col1'].isna() & df['col2'].isna() & df['col3'].isna(),
'original', 'referenced')
A new column in pandas which value depends on other columns
To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.
def calc_new_col(row):
if row['col2'] <= 50 & row['col3'] <= 50:
return row['col1']
else:
return max(row['col1'], row['col2'], row['col3'])
df["state"] = df.apply(calc_new_col, axis=1)
# axis=1 makes sure that function is applied to each row
print(df)
datetime col1 col2 col3 state
2021-04-10 01:00:00 25 50 50 25
2021-04-10 02:00:00 25 50 50 25
2021-04-10 03:00:00 25 100 50 100
2021-04-10 04:00:00 50 50 100 100
2021-04-10 05:00:00 100 100 100 100
apply
helps the code to be cleaner and more reusable.
Create new column based on conditions of other columns
If IndexDate
eist only for rows with OM/PM
in Name
generate year
s and aggregate first non missing values per ID
:
df['Year'] = df['IndexDate'].dt.year.groupby(df['ID']).transform('first')
For general solution add Series.where
for set missing values for not matched OM/PM
values:
df['Year'] = (df['IndexDate'].dt.year.where(df['Name'].isin(['OM','PM']))
.groupby(df['ID']).transform('first'))
IIUC need assign years by condition:
df['Year'] = np.where(df['Name'].isin(['OM','PM']),
df['IndexDate'].dt.year, df['WorkDate'].dt.year)
Create new column based on other columns from a different dataframe
IIUC this will get you the desired output (This does not include the np.nan from df2 where it == b, but I don't think you wanted that)
df_melt = df1.melt(id_vars = ['Time'])
df_melt.columns = ['Time', 'Item', 'Count']
df2 = df2.loc[df2['Class'] == 'A']
df_merge = pd.merge(df2, df_melt)
df_merge['Total'] = df_merge['Factor'] * df_merge['Count']
df_merge
how to create and fill a new column based on conditions in two other columns?
Try this:
df["E"] = np.nan
# Use boolean indexing to set no-yes to placeholder value
df.loc[(df["A"] == "no") & (df["B"] == "yes"), "E"] = "PL"
# Shift placeholder down by one, as it seems from your example
# that you want X to be on the no-yes "stopping" row
df["E"] = df.E.shift(1)
# Then set the X value on the yes-no rows
df.loc[(df.A == "yes") & (df.B == "no"), "E"] = "X"
df["E"] = df.E.ffill() # Fill forward
# Fix placeholders
df.loc[df.E == "PL", "E"] = np.nan
Results:
A B C D E
0 no no nan nan NaN
1 no no nan nan NaN
2 yes no X X X
3 yes no nan X X
4 no no nan X X
5 no yes nan X X
6 no yes nan nan NaN
7 yes no X X X
8 no no nan X X
9 yes no X X X
10 yes no nan X X
11 no no nan X X
12 no yes nan X X
13 no no nan nan NaN
Create new column based on values from three other columns in R
Additional to the solution by @r2evans in the comment section:
We could use coalesce
from dplyr
package:
df %>%
mutate(d = coalesce(a, b, c))
a b c d
1 1 NA NA 1
2 NA NA 5 5
3 3 NA NA 3
4 NA 4 NA 4
5 NA 50 NA 50
OR
We could use unite
from tidyr
package with na.rm
argument:
library(tidyr)
library(dplyr)
df %>%
unite(d, a:c, na.rm = TRUE, remove = FALSE)
d a b c
1 1 1 NA NA
2 5 NA NA 5
3 3 3 NA NA
4 4 NA 4 NA
5 50 NA 50 NA
Related Topics
What Does the Double Percentage Sign (%%) Mean
Sort Data Frame Column by Factor
What Do . (Dot) and % (Percentage) Mean in R
Data.Table in R - Multiple Filters Using Multiple Keys - Binary Search
Run a Custom Function on a Data Frame in R, by Group
Linear Model and Dplyr - a Better Solution
Apply a Function to Groups Within a Data.Frame in R
Applying the Same Factor Levels to Multiple Variables in an R Data Frame
Get Selected Rows of Rhandsontable
Inserting an Image to Ggplot Outside the Chart Area
Importing Only Every Nth Row from a .CSV File in R
Apply T-Test on Many Columns in a Dataframe Split by Factor
How to Conditionally Replace Values in R Data Frame Using If/Then Statement
Create Data Set from Clicks in Shiny Ggplot
Explanation of R: Options(Expressions=) to Non-Computer Scientists
R Error in Unique.Default(X) Unique() Applies Only to Vectors