Dataframe Create New Column Based on Other Columns

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

OK, two steps to this - first is to write a function that does the translation you want - I've put an example together based on your pseudo-code:

def label_race (row):
   if row['eri_hispanic'] == 1 :
      return 'Hispanic'
   if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
      return 'Two Or More'
   if row['eri_nat_amer'] == 1 :
      return 'A/I AK Native'
   if row['eri_asian'] == 1:
      return 'Asian'
   if row['eri_afr_amer']  == 1:
      return 'Black/AA'
   if row['eri_hawaiian'] == 1:
      return 'Haw/Pac Isl.'
   if row['eri_white'] == 1:
      return 'White'
   return 'Other'

You may want to go over this, but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled "row".

Next, use the apply function in pandas to apply the function - e.g.

df.apply (lambda row: label_race(row), axis=1)

Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:

0           White
1        Hispanic
2           White
3           White
4           Other
5           White
6     Two Or More
7           White
8    Haw/Pac Isl.
9           White

If you're happy with those results, then run it again, saving the results into a new column in your original dataframe.

df['race_label'] = df.apply (lambda row: label_race(row), axis=1)

The resultant dataframe looks like this (scroll to the right to see the new column):

      lname   fname rno_cd  eri_afr_amer  eri_asian  eri_hawaiian   eri_hispanic  eri_nat_amer  eri_white rno_defined    race_label
0      MOST    JEFF      E             0          0             0              0             0          1       White         White
1    CRUISE     TOM      E             0          0             0              1             0          0       White      Hispanic
2      DEPP  JOHNNY    NaN             0          0             0              0             0          1     Unknown         White
3     DICAP     LEO    NaN             0          0             0              0             0          1     Unknown         White
4    BRANDO  MARLON      E             0          0             0              0             0          0       White         Other
5     HANKS     TOM    NaN             0          0             0              0             0          1     Unknown         White
6    DENIRO  ROBERT      E             0          1             0              0             0          1       White   Two Or More
7    PACINO      AL      E             0          0             0              0             0          1       White         White
8  WILLIAMS   ROBIN      E             0          0             1              0             0          0       White  Haw/Pac Isl.
9  EASTWOOD   CLINT      E             0          0             0              0             0          1       White         White

Creating a new column based on other columns from another dataframe

This will do what your question asks:

df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]

Full test code:

import pandas as pd
import numpy as np
df = pd.DataFrame(columns=[
x.strip() for x in 'Name   Apples   Pears   Grapes   Peachs'.split()], data =[
['James',    3,       5,        5,        2],
['Harry',   1,       0,        2,        9],
['Will',     20,      2,        7,        3]])
print(df)

df2 = pd.DataFrame(columns=[
x.strip() for x in 'Class   User   Factor'.split()], data =[
['A',       'Harry',  3],
['A',       'Will',   2],
['A',       'James',  5],
['B',       np.nan,    4]])
print(df2)

df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]
print(df2)

Input:

    Name  Apples  Pears  Grapes  Peachs
0  James       3      5       5       2
1  Harry       1      0       2       9
2   Will      20      2       7       3
  Class   User  Factor
0     A  Harry       3
1     A   Will       2
2     A  James       5
3     B    NaN       4

Output

  Class   User  Factor             Total
0     A  Harry       3     [3, 0, 6, 27]
1     A   Will       2    [40, 4, 14, 6]
2     A  James       5  [15, 25, 25, 10]

Creating a new column based on conditions for other columns

Use DataFrame.isna for test all columns if missing and then DataFrame.all for test if all Trues per rows:

#If necessary
import numpy as np

df  = df.replace(['Nan', 'NaN'], np.nan)

df['col4'] = np.where(df[['col1','col2','col3']].isna().all(1), 'original', 'referenced')

Your solution with Series.isna:

df['col4'] = np.where(df['col1'].isna() & df['col2'].isna() & df['col3'].isna(), 
                     'original', 'referenced')

A new column in pandas which value depends on other columns

To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.

def calc_new_col(row):
   if row['col2'] <= 50 & row['col3'] <= 50:
        return row['col1']
    else:
        return max(row['col1'], row['col2'], row['col3'])

df["state"] = df.apply(calc_new_col, axis=1)
# axis=1 makes sure that function is applied to each row

print(df)
            datetime  col1  col2  col3  state
2021-04-10  01:00:00    25    50    50     25
2021-04-10  02:00:00    25    50    50     25
2021-04-10  03:00:00    25   100    50    100
2021-04-10  04:00:00    50    50   100    100
2021-04-10  05:00:00   100   100   100    100

apply helps the code to be cleaner and more reusable.

Create new column based on conditions of other columns

If IndexDate eist only for rows with OM/PM in Name generate years and aggregate first non missing values per ID:

df['Year'] = df['IndexDate'].dt.year.groupby(df['ID']).transform('first')

For general solution add Series.where for set missing values for not matched OM/PM values:

df['Year'] = (df['IndexDate'].dt.year.where(df['Name'].isin(['OM','PM']))
                             .groupby(df['ID']).transform('first'))

IIUC need assign years by condition:

df['Year'] = np.where(df['Name'].isin(['OM','PM']),
                      df['IndexDate'].dt.year, df['WorkDate'].dt.year)

Create new column based on other columns from a different dataframe

IIUC this will get you the desired output (This does not include the np.nan from df2 where it == b, but I don't think you wanted that)

df_melt = df1.melt(id_vars = ['Time'])
df_melt.columns = ['Time', 'Item', 'Count']
df2 = df2.loc[df2['Class'] == 'A']
df_merge = pd.merge(df2, df_melt)
df_merge['Total'] = df_merge['Factor'] * df_merge['Count']
df_merge

how to create and fill a new column based on conditions in two other columns?

Try this:

df["E"] = np.nan

# Use boolean indexing to set no-yes to placeholder value
df.loc[(df["A"] == "no") & (df["B"] == "yes"), "E"] = "PL"

# Shift placeholder down by one, as it seems from your example
# that you want X to be on the no-yes "stopping" row
df["E"] = df.E.shift(1)

# Then set the X value on the yes-no rows
df.loc[(df.A == "yes") & (df.B == "no"), "E"] = "X"
df["E"] = df.E.ffill() # Fill forward

# Fix placeholders
df.loc[df.E == "PL", "E"] = np.nan

Results:

    A   B   C   D   E
0   no  no  nan nan NaN
1   no  no  nan nan NaN
2   yes no  X   X   X
3   yes no  nan X   X
4   no  no  nan X   X
5   no  yes nan X   X
6   no  yes nan nan NaN
7   yes no  X   X   X
8   no  no  nan X   X
9   yes no  X   X   X
10  yes no  nan X   X
11  no  no  nan X   X
12  no  yes nan X   X
13  no  no  nan nan NaN

Create new column based on values from three other columns in R

Additional to the solution by @r2evans in the comment section:

We could use coalesce from dplyr package:

df %>% 
  mutate(d = coalesce(a, b, c))

   a  b  c  d
1  1 NA NA  1
2 NA NA  5  5
3  3 NA NA  3
4 NA  4 NA  4
5 NA 50 NA 50

We could use unite from tidyr package with na.rm argument:

library(tidyr)
library(dplyr)

df %>% 
  unite(d, a:c, na.rm = TRUE, remove = FALSE)

   d  a  b  c
1  1  1 NA NA
2  5 NA NA  5
3  3  3 NA NA
4  4 NA  4 NA
5 50 NA 50 NA