Add a New Column Between Other Dataframe Columns

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

OK, two steps to this - first is to write a function that does the translation you want - I've put an example together based on your pseudo-code:

def label_race (row):
if row['eri_hispanic'] == 1 :
return 'Hispanic'
if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
return 'Two Or More'
if row['eri_nat_amer'] == 1 :
return 'A/I AK Native'
if row['eri_asian'] == 1:
return 'Asian'
if row['eri_afr_amer'] == 1:
return 'Black/AA'
if row['eri_hawaiian'] == 1:
return 'Haw/Pac Isl.'
if row['eri_white'] == 1:
return 'White'
return 'Other'

You may want to go over this, but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled "row".

Next, use the apply function in pandas to apply the function - e.g.

df.apply (lambda row: label_race(row), axis=1)

Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:

0           White
1 Hispanic
2 White
3 White
4 Other
5 White
6 Two Or More
7 White
8 Haw/Pac Isl.
9 White

If you're happy with those results, then run it again, saving the results into a new column in your original dataframe.

df['race_label'] = df.apply (lambda row: label_race(row), axis=1)

The resultant dataframe looks like this (scroll to the right to see the new column):

      lname   fname rno_cd  eri_afr_amer  eri_asian  eri_hawaiian   eri_hispanic  eri_nat_amer  eri_white rno_defined    race_label
0 MOST JEFF E 0 0 0 0 0 1 White White
1 CRUISE TOM E 0 0 0 1 0 0 White Hispanic
2 DEPP JOHNNY NaN 0 0 0 0 0 1 Unknown White
3 DICAP LEO NaN 0 0 0 0 0 1 Unknown White
4 BRANDO MARLON E 0 0 0 0 0 0 White Other
5 HANKS TOM NaN 0 0 0 0 0 1 Unknown White
6 DENIRO ROBERT E 0 1 0 0 0 1 White Two Or More
7 PACINO AL E 0 0 0 0 0 1 White White
8 WILLIAMS ROBIN E 0 0 1 0 0 0 White Haw/Pac Isl.
9 EASTWOOD CLINT E 0 0 0 0 0 1 White White

Python: how to add a column to a pandas dataframe between two columns?

You can use insert:

df.insert(4, 'new_col_name', tmp)

Note: The insert method mutates the original DataFrame and does not return a copy.

If you use df = df.insert(4, 'new_col_name', tmp), df will be None.

Creating a new column based on other columns from another dataframe

This will do what your question asks:

df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]

Full test code:

import pandas as pd
import numpy as np
df = pd.DataFrame(columns=[
x.strip() for x in 'Name Apples Pears Grapes Peachs'.split()], data =[
['James', 3, 5, 5, 2],
['Harry', 1, 0, 2, 9],
['Will', 20, 2, 7, 3]])
print(df)

df2 = pd.DataFrame(columns=[
x.strip() for x in 'Class User Factor'.split()], data =[
['A', 'Harry', 3],
['A', 'Will', 2],
['A', 'James', 5],
['B', np.nan, 4]])
print(df2)

df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]
print(df2)

Input:

    Name  Apples  Pears  Grapes  Peachs
0 James 3 5 5 2
1 Harry 1 0 2 9
2 Will 20 2 7 3
Class User Factor
0 A Harry 3
1 A Will 2
2 A James 5
3 B NaN 4

Output

  Class   User  Factor             Total
0 A Harry 3 [3, 0, 6, 27]
1 A Will 2 [40, 4, 14, 6]
2 A James 5 [15, 25, 25, 10]

How to add new column from another dataframe based on values in column of first dataframe?

This is done via a join operation which in pandas can be done with .merge().

Kindly try using the following:

df = df.merge(population,how='left',on='Province')

Also please consider reading the following answer for a detailed guide on joins and merges

A new column in pandas which value depends on other columns

To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.

def calc_new_col(row):
if row['col2'] <= 50 & row['col3'] <= 50:
return row['col1']
else:
return max(row['col1'], row['col2'], row['col3'])

df["state"] = df.apply(calc_new_col, axis=1)
# axis=1 makes sure that function is applied to each row

print(df)
datetime col1 col2 col3 state
2021-04-10 01:00:00 25 50 50 25
2021-04-10 02:00:00 25 50 50 25
2021-04-10 03:00:00 25 100 50 100
2021-04-10 04:00:00 50 50 100 100
2021-04-10 05:00:00 100 100 100 100

apply helps the code to be cleaner and more reusable.

Create new column based on other columns from a different dataframe

IIUC this will get you the desired output (This does not include the np.nan from df2 where it == b, but I don't think you wanted that)

df_melt = df1.melt(id_vars = ['Time'])
df_melt.columns = ['Time', 'Item', 'Count']
df2 = df2.loc[df2['Class'] == 'A']
df_merge = pd.merge(df2, df_melt)
df_merge['Total'] = df_merge['Factor'] * df_merge['Count']
df_merge

Mapping columns from one dataframe to another to create a new column

df.merge

out = (df1.merge(df2, left_on='store', right_on='store_code')
.reindex(columns=['id', 'store', 'address', 'warehouse']))
print(out)

id store address warehouse
0 1 100 xyz Land
1 2 200 qwe Sea
2 3 300 asd Land
3 4 400 zxc Land
4 5 500 bnm Sea


pd.concat + df.sort_values

u = df1.sort_values('store')
v = df2.sort_values('store_code')[['warehouse']].reset_index(drop=1)
out = pd.concat([u, v], 1)

print(out)

id store address warehouse
0 1 100 xyz Land
1 2 200 qwe Sea
2 3 300 asd Land
3 4 400 zxc Land
4 5 500 bnm Sea

The first sort call is redundant assuming your dataframe is already sorted on store, in which case you may remove it.



df.replace/df.map

s = df1.store.replace(df2.set_index('store_code')['warehouse'])
print(s)
0 Land
1 Sea
2 Land
3 Land
4 Sea

df1['warehouse'] = s
print(df1)

id store address warehouse
0 1 100 xyz Land
1 2 200 qwe Sea
2 3 300 asd Land
3 4 400 zxc Land
4 5 500 bnm Sea

Alternatively, create a mapping explicitly. This works if you want to use it later.

mapping = dict(df2[['store_code', 'warehouse']].values)
df1['warehouse'] = df1.store.map(mapping)
print(df1)

id store address warehouse
0 1 100 xyz Land
1 2 200 qwe Sea
2 3 300 asd Land
3 4 400 zxc Land
4 5 500 bnm Sea

Adding a new column in pandas dataframe from another dataframe with differing indices

Assuming the size of your dataframes are the same, you can assign the RESULT_df['RESULT'].values to your original dataframe. This way, you don't have to worry about indexing issues.

# pre 0.24
feature_file_df['RESULT'] = RESULT_df['RESULT'].values
# >= 0.24
feature_file_df['RESULT'] = RESULT_df['RESULT'].to_numpy()

Minimal Code Sample

df
A B
0 -1.202564 2.786483
1 0.180380 0.259736
2 -0.295206 1.175316
3 1.683482 0.927719
4 -0.199904 1.077655

df2

C
11 -0.140670
12 1.496007
13 0.263425
14 -0.557958
15 -0.018375

Let's try direct assignment first.

df['C'] = df2['C']
df

A B C
0 -1.202564 2.786483 NaN
1 0.180380 0.259736 NaN
2 -0.295206 1.175316 NaN
3 1.683482 0.927719 NaN
4 -0.199904 1.077655 NaN

Now, assign the array returned by .values (or .to_numpy() for pandas versions >0.24). .values returns a numpy array which does not have an index.

df2['C'].values 
array([-0.141, 1.496, 0.263, -0.558, -0.018])

df['C'] = df2['C'].values
df

A B C
0 -1.202564 2.786483 -0.140670
1 0.180380 0.259736 1.496007
2 -0.295206 1.175316 0.263425
3 1.683482 0.927719 -0.557958
4 -0.199904 1.077655 -0.018375

Adding a column with values from another dataframe based on column conditions

You can do this:

Consider my sample dataframes:

In [2327]: df_1                                                                                                                                                                                              
Out[2327]:
State Month Total Time
0 AL 2 1000
1 AB 4 500
2 BC 1 600

In [2328]: df_2
Out[2328]:
State Month
0 AL 2
1 AB 5

In [2329]: df_2 = pd.merge(df_2, df_1, on=['State', 'Month'], how='left')

In [2330]: df_2
Out[2330]:
State Month Total Time
0 AL 2 1000.0
1 AB 5 NaN


Related Topics



Leave a reply



Submit