Pandas Create New Column Based on Values from Other Columns/Apply a Function of Multiple Columns, Row-Wise

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

OK, two steps to this - first is to write a function that does the translation you want - I've put an example together based on your pseudo-code:

def label_race (row):
   if row['eri_hispanic'] == 1 :
      return 'Hispanic'
   if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
      return 'Two Or More'
   if row['eri_nat_amer'] == 1 :
      return 'A/I AK Native'
   if row['eri_asian'] == 1:
      return 'Asian'
   if row['eri_afr_amer']  == 1:
      return 'Black/AA'
   if row['eri_hawaiian'] == 1:
      return 'Haw/Pac Isl.'
   if row['eri_white'] == 1:
      return 'White'
   return 'Other'

You may want to go over this, but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled "row".

Next, use the apply function in pandas to apply the function - e.g.

df.apply (lambda row: label_race(row), axis=1)

Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:

0           White
1        Hispanic
2           White
3           White
4           Other
5           White
6     Two Or More
7           White
8    Haw/Pac Isl.
9           White

If you're happy with those results, then run it again, saving the results into a new column in your original dataframe.

df['race_label'] = df.apply (lambda row: label_race(row), axis=1)

The resultant dataframe looks like this (scroll to the right to see the new column):

      lname   fname rno_cd  eri_afr_amer  eri_asian  eri_hawaiian   eri_hispanic  eri_nat_amer  eri_white rno_defined    race_label
0      MOST    JEFF      E             0          0             0              0             0          1       White         White
1    CRUISE     TOM      E             0          0             0              1             0          0       White      Hispanic
2      DEPP  JOHNNY    NaN             0          0             0              0             0          1     Unknown         White
3     DICAP     LEO    NaN             0          0             0              0             0          1     Unknown         White
4    BRANDO  MARLON      E             0          0             0              0             0          0       White         Other
5     HANKS     TOM    NaN             0          0             0              0             0          1     Unknown         White
6    DENIRO  ROBERT      E             0          1             0              0             0          1       White   Two Or More
7    PACINO      AL      E             0          0             0              0             0          1       White         White
8  WILLIAMS   ROBIN      E             0          0             1              0             0          0       White  Haw/Pac Isl.
9  EASTWOOD   CLINT      E             0          0             0              0             0          1       White         White

Apply function on multiple columns and create new column based on condition

I first had to add the columns and fill them with zeros, then apply the function.

def conditions(x,column1, column2):
        if x[column1] != x[column2]: 
            return "incorrect"
        else:
            return "correct"
    
    
lst1=["col1","col2","col3","col4","col5"]
lst2=["col1_1","col2_2","col3_3","col4_4","col5_5"]
i=0
for item in lst2: 
    df[str(item)+"_2"] = 0

i=0
for item in df.columns[-5:]:
    df[item]=df.apply(lambda x: conditions(x, column1=lst1[i], column2=lst2[i]) , axis=1) 
    i=i+1

A new column in pandas which value depends on other columns

To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.

def calc_new_col(row):
   if row['col2'] <= 50 & row['col3'] <= 50:
        return row['col1']
    else:
        return max(row['col1'], row['col2'], row['col3'])

df["state"] = df.apply(calc_new_col, axis=1)
# axis=1 makes sure that function is applied to each row

print(df)
            datetime  col1  col2  col3  state
2021-04-10  01:00:00    25    50    50     25
2021-04-10  02:00:00    25    50    50     25
2021-04-10  03:00:00    25   100    50    100
2021-04-10  04:00:00    50    50   100    100
2021-04-10  05:00:00   100   100   100    100

apply helps the code to be cleaner and more reusable.

Pandas apply row-wise a function and create multiple new columns

To do this, you need to:

Transpose df2 so that its columns are correct for concatenation
Index it with the df1["sic"] column to get the correct rows
Reset the index of the obtained rows of df2 using .reset_index(drop=True), so that the dataframes can be concatenated correctly. (This replaces the current index e.g. 5, 6, 3, 8, 12, 6 with a new one e.g. 0, 1, 2, 3, 4, 5 while keeping the actual values the same. This is so that pandas doesn't get confused while concatenating them)
Concatenate the two dataframes

Note: I used a method based off of this to read in the dataframe, and it assumed that the columns of df2 were strings but the values of the sic column of df1 were ints. Therefore I used .astype(str) to get step 2 working. If this is not actually the case, you may need to remove the .astype(str).

Here is the single line of code to do these things:

merged = pd.concat([df1, df2.T.loc[df1["sic"].astype(str)].reset_index(drop=True)], axis=1)

Here is the full code I used:

from io import StringIO
import pandas as pd

df1 = pd.read_csv(StringIO("""
sic data1   data2   data3   data4   data5
5   0.90783598  0.84722083  0.47149924  0.98724123  0.50654476
6   0.53442684  0.59730371  0.92486887  0.61531646  0.62784041
3   0.56806423  0.09619383  0.33846097  0.71878313  0.96316724
8   0.86933042  0.64965755  0.94549745  0.08866519  0.92156389
12  0.651328    0.37193774  0.9679044   0.36898991  0.15161838
6   0.24555531  0.50195983  0.79114578  0.9290596   0.10672607
"""), sep="\t")
df2 = pd.read_csv(StringIO("""
    1   2   3   4   5   6   7   8   9   10  11  12
c_bar   0.4955329   0.92970292  0.68049726  0.91325006  0.55578465  0.78056519  0.53954711  0.90335326  0.93986402  0.0204794   0.51575764  0.61144255
a1_bar  0.75781444  0.81052669  0.99910449  0.62181902  0.11797144  0.40031316  0.08561665  0.35296894  0.14445697  0.93799762  0.80641802  0.31379671
a2_bar  0.41432552  0.36313911  0.13091618  0.39251953  0.66249636  0.31221897  0.15988528  0.1620938   0.55143589  0.66571044  0.68198944  0.23806947
a3_bar  0.38918855  0.83689178  0.15838139  0.39943204  0.48615188  0.06299899  0.86343819  0.47975619  0.05300611  0.15080875  0.73088725  0.3500239
a4_bar  0.47201384  0.90874121  0.50417142  0.70047698  0.24820601  0.34302454  0.4650635   0.0992668   0.55142391  0.82947194  0.28251699  0.53170308
"""), sep="\t", index_col=[0])

merged = pd.concat([df1, df2.T.loc[df1["sic"].astype(str)].reset_index(drop=True)], axis=1)

print(merged)

which produces the output:

   sic     data1     data2     data3  ...    a1_bar    a2_bar    a3_bar    a4_bar
0    5  0.907836  0.847221  0.471499  ...  0.117971  0.662496  0.486152  0.248206
1    6  0.534427  0.597304  0.924869  ...  0.400313  0.312219  0.062999  0.343025
2    3  0.568064  0.096194  0.338461  ...  0.999104  0.130916  0.158381  0.504171
3    8  0.869330  0.649658  0.945497  ...  0.352969  0.162094  0.479756  0.099267
4   12  0.651328  0.371938  0.967904  ...  0.313797  0.238069  0.350024  0.531703
5    6  0.245555  0.501960  0.791146  ...  0.400313  0.312219  0.062999  0.343025

[6 rows x 11 columns]

Make new column in Panda dataframe by adding values from other columns

Very simple:

df['C'] = df['A'] + df['B']

pandas apply function to multiple columns with condition and create new columns

First is necessary convert strings repr of lists by ast.literal_eval to lists, then for chceck length remove casting to strings. If need one element lists instead scalars use [] in fruit[0] and fruit[1] and last change order of condition for len(fruit) == 1, also change len(fruit) > 3 to len(fruit) > 2 for match first row:

def fruits_vegetable(row):
    
    fruit = ast.literal_eval(row['fruit_code'])
    vege = ast.literal_eval(row['vegetable_code'])
    
    if len(fruit) == 1 and len(vege) > 1:   # write "all" in new_col_1 
        row['new_col_1'] = 'all'
    elif len(fruit) > 2 and len(vege) == 1: # vegetable_code in new_col_1
        row['new_col_1'] = vege
    elif len(fruit) > 2 and len(vege) > 1:  # write "all" in new_col_1
        row['new_col_1'] = 'all'
    elif len(fruit) == 2 and len(vege) >= 0:# fruit 1 new_col_1 & fruit 2 new_col_2
        row['new_col_1'] = [fruit[0]]
        row['new_col_2'] = [fruit[1]]
    elif len(fruit) == 1:                   # fruit_code in new_col_1
        row['new_col_1'] = fruit
    return row


df = df.apply(fruits_vegetable, axis=1)

print (df)
   ID        date        fruit_code new_col_1 new_col_2 supermarket  \
0   1  2022-01-01      [100,99,300]       all       NaN          xy   
1   2  2022-01-01       [67,200,87]    [5000]       NaN        z, m   
2   3  2021-01-01    [100,5,300,78]       all       NaN       wf, z   
3   4  2020-01-01              [77]      [77]       NaN         NaN   
4   5  2022-15-01  [100,200,546,33]       all       NaN       t, wf   
5   6  2002-12-01            [64,2]      [64]       [2]           k   
6   7  2018-12-01               [5]       all       NaN           p   

  supermarkt    vegetable_code  
0        NaN  [1000,2000,3000]  
1        NaN            [5000]  
2        NaN  [7000,2000,3000]  
3         wf            [1000]  
4        NaN  [4000,2000,3000]  
5        NaN  [6000,8000,1000]  
6        NaN  [6000,8000,1000]

Creating a new column based on if-elif-else condition

To formalize some of the approaches laid out above:

Create a function that operates on the rows of your dataframe like so:

def f(row):
    if row['A'] == row['B']:
        val = 0
    elif row['A'] > row['B']:
        val = 1
    else:
        val = -1
    return val

Then apply it to your dataframe passing in the axis=1 option:

In [1]: df['C'] = df.apply(f, axis=1)

In [2]: df
Out[2]:
   A  B  C
a  2  2  0
b  3  1  1
c  1  3 -1

Of course, this is not vectorized so performance may not be as good when scaled to a large number of records. Still, I think it is much more readable. Especially coming from a SAS background.

Edit

Here is the vectorized version

df['C'] = np.where(
    df['A'] == df['B'], 0, np.where(
    df['A'] >  df['B'], 1, -1))

Python: Create a new column based on different conditions

You can try groupby and use isin to check value existence then assign

out = (df.groupby(['ID', 'DATE'])
       .apply(lambda g: g.assign(OUTPUT=[g['Destination_1'].isin(g['Destination_2']).any()]*len(g))))

print(out)

    ID     DATE Destination_1 Destination_2  OUTPUT
0   97  2018-09        BRAZIL        BRAZIL    True
1   97  2018-09        BRAZIL       URUGUAY    True
2   97  2020-02        BRAZIL     SINGAPORE   False
3   19  2021-01     ARGENTINA     VENEZUELA   False
4   19  2021-01       MOROCCO      THAILAND   False
5   33  2017-04     INDONESIA          PERU    True
6   33  2017-04           USA       ECUADOR    True
7   33  2017-04        BRAZIL           USA    True
8   33  2019-09         EGYPT       ALGERIA    True
9   33  2019-09       LEBANON         EGYPT    True
10  11  2022-05           USA        CANADA   False

Pandas Create New Column Based on Values from Other Columns/Apply a Function of Multiple Columns, Row-Wise