Dataframe Create New Column Based on Other Columns

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

OK, two steps to this - first is to write a function that does the translation you want - I've put an example together based on your pseudo-code:

def label_race (row):
if row['eri_hispanic'] == 1 :
return 'Hispanic'
if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
return 'Two Or More'
if row['eri_nat_amer'] == 1 :
return 'A/I AK Native'
if row['eri_asian'] == 1:
return 'Asian'
if row['eri_afr_amer'] == 1:
return 'Black/AA'
if row['eri_hawaiian'] == 1:
return 'Haw/Pac Isl.'
if row['eri_white'] == 1:
return 'White'
return 'Other'

You may want to go over this, but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled "row".

Next, use the apply function in pandas to apply the function - e.g.

df.apply (lambda row: label_race(row), axis=1)

Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:

0           White
1 Hispanic
2 White
3 White
4 Other
5 White
6 Two Or More
7 White
8 Haw/Pac Isl.
9 White

If you're happy with those results, then run it again, saving the results into a new column in your original dataframe.

df['race_label'] = df.apply (lambda row: label_race(row), axis=1)

The resultant dataframe looks like this (scroll to the right to see the new column):

      lname   fname rno_cd  eri_afr_amer  eri_asian  eri_hawaiian   eri_hispanic  eri_nat_amer  eri_white rno_defined    race_label
0 MOST JEFF E 0 0 0 0 0 1 White White
1 CRUISE TOM E 0 0 0 1 0 0 White Hispanic
2 DEPP JOHNNY NaN 0 0 0 0 0 1 Unknown White
3 DICAP LEO NaN 0 0 0 0 0 1 Unknown White
4 BRANDO MARLON E 0 0 0 0 0 0 White Other
5 HANKS TOM NaN 0 0 0 0 0 1 Unknown White
6 DENIRO ROBERT E 0 1 0 0 0 1 White Two Or More
7 PACINO AL E 0 0 0 0 0 1 White White
8 WILLIAMS ROBIN E 0 0 1 0 0 0 White Haw/Pac Isl.
9 EASTWOOD CLINT E 0 0 0 0 0 1 White White

Creating a new column based on other columns from another dataframe

This will do what your question asks:

df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]

Full test code:

import pandas as pd
import numpy as np
df = pd.DataFrame(columns=[
x.strip() for x in 'Name Apples Pears Grapes Peachs'.split()], data =[
['James', 3, 5, 5, 2],
['Harry', 1, 0, 2, 9],
['Will', 20, 2, 7, 3]])
print(df)

df2 = pd.DataFrame(columns=[
x.strip() for x in 'Class User Factor'.split()], data =[
['A', 'Harry', 3],
['A', 'Will', 2],
['A', 'James', 5],
['B', np.nan, 4]])
print(df2)

df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]
print(df2)

Input:

    Name  Apples  Pears  Grapes  Peachs
0 James 3 5 5 2
1 Harry 1 0 2 9
2 Will 20 2 7 3
Class User Factor
0 A Harry 3
1 A Will 2
2 A James 5
3 B NaN 4

Output

  Class   User  Factor             Total
0 A Harry 3 [3, 0, 6, 27]
1 A Will 2 [40, 4, 14, 6]
2 A James 5 [15, 25, 25, 10]

Creating a new column based on conditions for other columns

Use DataFrame.isna for test all columns if missing and then DataFrame.all for test if all Trues per rows:

#If necessary
import numpy as np

df = df.replace(['Nan', 'NaN'], np.nan)

df['col4'] = np.where(df[['col1','col2','col3']].isna().all(1), 'original', 'referenced')

Your solution with Series.isna:

df['col4'] = np.where(df['col1'].isna() & df['col2'].isna() & df['col3'].isna(), 
'original', 'referenced')

A new column in pandas which value depends on other columns

To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.

def calc_new_col(row):
if row['col2'] <= 50 & row['col3'] <= 50:
return row['col1']
else:
return max(row['col1'], row['col2'], row['col3'])

df["state"] = df.apply(calc_new_col, axis=1)
# axis=1 makes sure that function is applied to each row

print(df)
datetime col1 col2 col3 state
2021-04-10 01:00:00 25 50 50 25
2021-04-10 02:00:00 25 50 50 25
2021-04-10 03:00:00 25 100 50 100
2021-04-10 04:00:00 50 50 100 100
2021-04-10 05:00:00 100 100 100 100

apply helps the code to be cleaner and more reusable.

Create new column based on conditions of other columns

If IndexDate eist only for rows with OM/PM in Name generate years and aggregate first non missing values per ID:

df['Year'] = df['IndexDate'].dt.year.groupby(df['ID']).transform('first')

For general solution add Series.where for set missing values for not matched OM/PM values:

df['Year'] = (df['IndexDate'].dt.year.where(df['Name'].isin(['OM','PM']))
.groupby(df['ID']).transform('first'))

IIUC need assign years by condition:

df['Year'] = np.where(df['Name'].isin(['OM','PM']),
df['IndexDate'].dt.year, df['WorkDate'].dt.year)

Create new column based on other columns from a different dataframe

IIUC this will get you the desired output (This does not include the np.nan from df2 where it == b, but I don't think you wanted that)

df_melt = df1.melt(id_vars = ['Time'])
df_melt.columns = ['Time', 'Item', 'Count']
df2 = df2.loc[df2['Class'] == 'A']
df_merge = pd.merge(df2, df_melt)
df_merge['Total'] = df_merge['Factor'] * df_merge['Count']
df_merge

how to create and fill a new column based on conditions in two other columns?

Try this:

df["E"] = np.nan

# Use boolean indexing to set no-yes to placeholder value
df.loc[(df["A"] == "no") & (df["B"] == "yes"), "E"] = "PL"

# Shift placeholder down by one, as it seems from your example
# that you want X to be on the no-yes "stopping" row
df["E"] = df.E.shift(1)

# Then set the X value on the yes-no rows
df.loc[(df.A == "yes") & (df.B == "no"), "E"] = "X"
df["E"] = df.E.ffill() # Fill forward

# Fix placeholders
df.loc[df.E == "PL", "E"] = np.nan

Results:

    A   B   C   D   E
0 no no nan nan NaN
1 no no nan nan NaN
2 yes no X X X
3 yes no nan X X
4 no no nan X X
5 no yes nan X X
6 no yes nan nan NaN
7 yes no X X X
8 no no nan X X
9 yes no X X X
10 yes no nan X X
11 no no nan X X
12 no yes nan X X
13 no no nan nan NaN

Create new column based on values from three other columns in R

Additional to the solution by @r2evans in the comment section:

We could use coalesce from dplyr package:

df %>% 
mutate(d = coalesce(a, b, c))
   a  b  c  d
1 1 NA NA 1
2 NA NA 5 5
3 3 NA NA 3
4 NA 4 NA 4
5 NA 50 NA 50

OR

We could use unite from tidyr package with na.rm argument:

library(tidyr)
library(dplyr)

df %>%
unite(d, a:c, na.rm = TRUE, remove = FALSE)
   d  a  b  c
1 1 1 NA NA
2 5 NA NA 5
3 3 3 NA NA
4 4 NA 4 NA
5 50 NA 50 NA


Related Topics



Leave a reply



Submit