Add a New Column Between Other Dataframe Columns

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

OK, two steps to this - first is to write a function that does the translation you want - I've put an example together based on your pseudo-code:

def label_race (row):
   if row['eri_hispanic'] == 1 :
      return 'Hispanic'
   if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
      return 'Two Or More'
   if row['eri_nat_amer'] == 1 :
      return 'A/I AK Native'
   if row['eri_asian'] == 1:
      return 'Asian'
   if row['eri_afr_amer']  == 1:
      return 'Black/AA'
   if row['eri_hawaiian'] == 1:
      return 'Haw/Pac Isl.'
   if row['eri_white'] == 1:
      return 'White'
   return 'Other'

You may want to go over this, but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled "row".

Next, use the apply function in pandas to apply the function - e.g.

df.apply (lambda row: label_race(row), axis=1)

Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:

0           White
1        Hispanic
2           White
3           White
4           Other
5           White
6     Two Or More
7           White
8    Haw/Pac Isl.
9           White

If you're happy with those results, then run it again, saving the results into a new column in your original dataframe.

df['race_label'] = df.apply (lambda row: label_race(row), axis=1)

The resultant dataframe looks like this (scroll to the right to see the new column):

      lname   fname rno_cd  eri_afr_amer  eri_asian  eri_hawaiian   eri_hispanic  eri_nat_amer  eri_white rno_defined    race_label
0      MOST    JEFF      E             0          0             0              0             0          1       White         White
1    CRUISE     TOM      E             0          0             0              1             0          0       White      Hispanic
2      DEPP  JOHNNY    NaN             0          0             0              0             0          1     Unknown         White
3     DICAP     LEO    NaN             0          0             0              0             0          1     Unknown         White
4    BRANDO  MARLON      E             0          0             0              0             0          0       White         Other
5     HANKS     TOM    NaN             0          0             0              0             0          1     Unknown         White
6    DENIRO  ROBERT      E             0          1             0              0             0          1       White   Two Or More
7    PACINO      AL      E             0          0             0              0             0          1       White         White
8  WILLIAMS   ROBIN      E             0          0             1              0             0          0       White  Haw/Pac Isl.
9  EASTWOOD   CLINT      E             0          0             0              0             0          1       White         White

Python: how to add a column to a pandas dataframe between two columns?

You can use insert:

df.insert(4, 'new_col_name', tmp)

Note: The insert method mutates the original DataFrame and does not return a copy.

If you use df = df.insert(4, 'new_col_name', tmp), df will be None.

Creating a new column based on other columns from another dataframe

This will do what your question asks:

df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]

Full test code:

import pandas as pd
import numpy as np
df = pd.DataFrame(columns=[
x.strip() for x in 'Name   Apples   Pears   Grapes   Peachs'.split()], data =[
['James',    3,       5,        5,        2],
['Harry',   1,       0,        2,        9],
['Will',     20,      2,        7,        3]])
print(df)

df2 = pd.DataFrame(columns=[
x.strip() for x in 'Class   User   Factor'.split()], data =[
['A',       'Harry',  3],
['A',       'Will',   2],
['A',       'James',  5],
['B',       np.nan,    4]])
print(df2)

df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]
print(df2)

Input:

    Name  Apples  Pears  Grapes  Peachs
0  James       3      5       5       2
1  Harry       1      0       2       9
2   Will      20      2       7       3
  Class   User  Factor
0     A  Harry       3
1     A   Will       2
2     A  James       5
3     B    NaN       4

Output

  Class   User  Factor             Total
0     A  Harry       3     [3, 0, 6, 27]
1     A   Will       2    [40, 4, 14, 6]
2     A  James       5  [15, 25, 25, 10]

How to add new column from another dataframe based on values in column of first dataframe?

This is done via a join operation which in pandas can be done with .merge().

Kindly try using the following:

df = df.merge(population,how='left',on='Province')

Also please consider reading the following answer for a detailed guide on joins and merges

A new column in pandas which value depends on other columns

To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.

def calc_new_col(row):
   if row['col2'] <= 50 & row['col3'] <= 50:
        return row['col1']
    else:
        return max(row['col1'], row['col2'], row['col3'])

df["state"] = df.apply(calc_new_col, axis=1)
# axis=1 makes sure that function is applied to each row

print(df)
            datetime  col1  col2  col3  state
2021-04-10  01:00:00    25    50    50     25
2021-04-10  02:00:00    25    50    50     25
2021-04-10  03:00:00    25   100    50    100
2021-04-10  04:00:00    50    50   100    100
2021-04-10  05:00:00   100   100   100    100

apply helps the code to be cleaner and more reusable.

Create new column based on other columns from a different dataframe

IIUC this will get you the desired output (This does not include the np.nan from df2 where it == b, but I don't think you wanted that)

df_melt = df1.melt(id_vars = ['Time'])
df_melt.columns = ['Time', 'Item', 'Count']
df2 = df2.loc[df2['Class'] == 'A']
df_merge = pd.merge(df2, df_melt)
df_merge['Total'] = df_merge['Factor'] * df_merge['Count']
df_merge

Mapping columns from one dataframe to another to create a new column

`df.merge`

out = (df1.merge(df2, left_on='store', right_on='store_code')
          .reindex(columns=['id', 'store', 'address', 'warehouse']))
print(out)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

`pd.concat` + `df.sort_values`

u = df1.sort_values('store')
v = df2.sort_values('store_code')[['warehouse']].reset_index(drop=1)
out = pd.concat([u, v], 1)

print(out)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

The first sort call is redundant assuming your dataframe is already sorted on store, in which case you may remove it.

`df.replace`/`df.map`

s = df1.store.replace(df2.set_index('store_code')['warehouse'])
print(s) 
0    Land
1     Sea
2    Land
3    Land
4     Sea

df1['warehouse'] = s
print(df1)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

Alternatively, create a mapping explicitly. This works if you want to use it later.

mapping = dict(df2[['store_code', 'warehouse']].values)
df1['warehouse'] = df1.store.map(mapping)
print(df1)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

Adding a new column in pandas dataframe from another dataframe with differing indices

Assuming the size of your dataframes are the same, you can assign the RESULT_df['RESULT'].values to your original dataframe. This way, you don't have to worry about indexing issues.

# pre 0.24
feature_file_df['RESULT'] = RESULT_df['RESULT'].values
# >= 0.24
feature_file_df['RESULT'] = RESULT_df['RESULT'].to_numpy()

Minimal Code Sample

df
          A         B
0 -1.202564  2.786483
1  0.180380  0.259736
2 -0.295206  1.175316
3  1.683482  0.927719
4 -0.199904  1.077655

df2

           C
11 -0.140670
12  1.496007
13  0.263425
14 -0.557958
15 -0.018375

Let's try direct assignment first.

df['C'] = df2['C']
df

          A         B   C
0 -1.202564  2.786483 NaN
1  0.180380  0.259736 NaN
2 -0.295206  1.175316 NaN
3  1.683482  0.927719 NaN
4 -0.199904  1.077655 NaN

Now, assign the array returned by .values (or .to_numpy() for pandas versions >0.24). .values returns a numpy array which does not have an index.

df2['C'].values 
array([-0.141,  1.496,  0.263, -0.558, -0.018])

df['C'] = df2['C'].values
df

          A         B         C
0 -1.202564  2.786483 -0.140670
1  0.180380  0.259736  1.496007
2 -0.295206  1.175316  0.263425
3  1.683482  0.927719 -0.557958
4 -0.199904  1.077655 -0.018375

Adding a column with values from another dataframe based on column conditions

You can do this:

Consider my sample dataframes:

In [2327]: df_1                                                                                                                                                                                              
Out[2327]: 
  State  Month  Total Time
0    AL      2        1000
1    AB      4         500
2    BC      1         600

In [2328]: df_2                                                                                                                                                                                              
Out[2328]: 
  State  Month
0    AL      2
1    AB      5

In [2329]: df_2 = pd.merge(df_2, df_1, on=['State', 'Month'], how='left')                                                                                                                                      

In [2330]: df_2                                                                                                                                                                                              
Out[2330]: 
  State  Month  Total Time
0    AL      2      1000.0
1    AB      5         NaN

Add a New Column Between Other Dataframe Columns

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

Python: how to add a column to a pandas dataframe between two columns?

Creating a new column based on other columns from another dataframe

How to add new column from another dataframe based on values in column of first dataframe?

A new column in pandas which value depends on other columns

Create new column based on other columns from a different dataframe

Mapping columns from one dataframe to another to create a new column

`df.merge`

`pd.concat` + `df.sort_values`

`df.replace`/`df.map`

Adding a new column in pandas dataframe from another dataframe with differing indices

Adding a column with values from another dataframe based on column conditions

Related Topics

Leave a reply

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

Python: how to add a column to a pandas dataframe between two columns?

Creating a new column based on other columns from another dataframe

How to add new column from another dataframe based on values in column of first dataframe?

A new column in pandas which value depends on other columns

Create new column based on other columns from a different dataframe

Mapping columns from one dataframe to another to create a new column

df.merge

pd.concat + df.sort_values

df.replace/df.map

Adding a new column in pandas dataframe from another dataframe with differing indices

Adding a column with values from another dataframe based on column conditions

Related Topics

Leave a reply

`df.merge`

`pd.concat` + `df.sort_values`

`df.replace`/`df.map`