Use Merge() to Update a Data Frame with Values from a Second Data Frame

How to update column value of a data frame from another data frame matching 2 columns?

Here's a way to do it:

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')

Explanation:

  • modify df2 so it has Team ID, Group as its index and its only column is Result
  • use join to bring the new scores from df2 into a Result column in df1
  • use loc to update Score values for rows where Result is not null (i.e., rows for which an updated Score is available)
  • drop the Result column.

Full test code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'DEP ID':['001','001','002','002'],
'Team ID':['002','004','002','007'],
'Group':['A','A','A','A'],
'Score':[50,70,50,90]})
df2 = pd.DataFrame({
'DEP ID':['001','001','001'],
'Team ID':['002','003','004'],
'Group':['A','A','A'],
'Result':[80,60,70]})

print(df1)
print(df2)

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')
print(df1)

Output:

   index DEP ID Team ID Group  Score
0 0 001 002 A 80
1 1 001 004 A 70
2 2 002 002 A 80
3 3 002 007 A 90

UPDATE:

If Result column in df2 is instead named Score, as asked by OP in a comment, then the code can be adjusted slightly as follows:

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'], rsuffix='_NEW')
df1.loc[df1.Score_NEW.notna(), 'Score'] = df1.Score_NEW
df1 = df1.drop(columns='Score_NEW')

How to merge 2 pandas data frames and update a column with latest value from 2 matched rows?

Something like that can do the job ...

Just make sure your updated_at column is set as datetime

>>> pd.concat([df1,df2]).sort_values('updated_at').drop_duplicates(subset=df1.columns[:-1],keep='last').sort_values('MRN')
MRN Encounter_ID First_Name Last_Name Birth_Date updated_at
1 1234 John Doe 01/02/1999 04/12/2002 2020-12-31 06:00:00
2 2345 Joanne Lee 04/19/2002 04/19/2002 2020-12-31 08:22:00
3 3456 Annabelle Jones 01/02/2001 04/21/2002 2020-12-31 05:00:00

update data frame based on data from another data frame using pandas python

try this, using outer merge which gives both matching and non-matching records.

In [75]: df_m = df1.merge(df2, on="SKUCode", how='outer')                                                                                                         

In [76]: mask = df_m['Status'].isnull()

In [77]: df_m.loc[~mask, 'SKUStatus'] = df_m.loc[~mask, 'Status']

In [78]: df_m[['SKUCode', "ListPrice", "SalePrice", "SKUStatus", "CostPrice"]].fillna(0.0)

output

  SKUCode  ListPrice  SalePrice  SKUStatus  CostPrice
0 A 1798.0 1798.0 1.0 500.0
1 B 2997.0 2997.0 0.0 773.0
2 C 1798.0 1798.0 1.0 525.0
3 D 999.0 999.0 0.0 300.0
4 X 0.0 0.0 0.0 0.0
5 Y 0.0 0.0 0.0 0.0

R: add value from another data frame by finding same values in two data frames

use merge() from the base package

merge(df1, df2, by = 'Code', all.x=T, all.y=F)

How to merge two different size DataFrames in Pandas to update one dataframe depends on matching partial values in one column with another dataframe

You can use .update() after setting index on time on both data_1a and data_1b, as follows:

data_1a = data_1.set_index('time')
data_1a.update(data_2.set_index('time'))
data_out = data_1a.reset_index()

.update() modifies in place using non-NA values from another DataFrame. Aligns on indices. Thus, when you set time as index on both data_1a and data_1b, .update() aligns on matching values in column time to perform the update of data_1 by corresponding values of data_2.

Data Setup:

a = {
'time':[1,2,3,4,5,6],
'column_1':[2,2,2,2,2,2],
'column_2':[3,3,3,3,3,3]
}
b = {
'time':[3,4,5],
'column_1':[0,0,0],
'column_2':[0,0,0]
}
data_1 = pd.DataFrame(a)
data_2 = pd.DataFrame(b)

Result:

print(data_out)

time column_1 column_2
0 1 2.0 3.0
1 2 2.0 3.0
2 3 0.0 0.0
3 4 0.0 0.0
4 5 0.0 0.0
5 6 2.0 3.0

Python Pandas - Vlookup - Update Existing Column in First Data Frame From Second Data Frame

Use Pandas merge over df1 and df2 on columns ['key','info'], then, use column key as column name to join on and use only the keys from left dataframe how='left'. Get the resulting column (info_y) into the first dataframe.

df1['info'] = pd.merge(df1[['key','info']], df2[['key','info']], on='key', how='left')['info_y']
print(df1)

Output from df1

  dataA  dataB  key   info dataC
0 ABC 123 a1b infoA aaa
1 DEF 456 b57 NaN bbb
2 GHI 789 a22 infoC ccc


Related Topics



Leave a reply



Submit