How to update column value of a data frame from another data frame matching 2 columns?
Here's a way to do it:
df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')
Explanation:
- modify df2 so it has
Team ID, Group
as its index and its only column isResult
- use
join
to bring the new scores from df2 into aResult
column in df1 - use
loc
to updateScore
values for rows whereResult
is not null (i.e., rows for which an updatedScore
is available) - drop the
Result
column.
Full test code:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'DEP ID':['001','001','002','002'],
'Team ID':['002','004','002','007'],
'Group':['A','A','A','A'],
'Score':[50,70,50,90]})
df2 = pd.DataFrame({
'DEP ID':['001','001','001'],
'Team ID':['002','003','004'],
'Group':['A','A','A'],
'Result':[80,60,70]})
print(df1)
print(df2)
df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')
print(df1)
Output:
index DEP ID Team ID Group Score
0 0 001 002 A 80
1 1 001 004 A 70
2 2 002 002 A 80
3 3 002 007 A 90
UPDATE:
If Result
column in df2 is instead named Score
, as asked by OP in a comment, then the code can be adjusted slightly as follows:
df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'], rsuffix='_NEW')
df1.loc[df1.Score_NEW.notna(), 'Score'] = df1.Score_NEW
df1 = df1.drop(columns='Score_NEW')
How to merge 2 pandas data frames and update a column with latest value from 2 matched rows?
Something like that can do the job ...
Just make sure your updated_at
column is set as datetime
>>> pd.concat([df1,df2]).sort_values('updated_at').drop_duplicates(subset=df1.columns[:-1],keep='last').sort_values('MRN')
MRN Encounter_ID First_Name Last_Name Birth_Date updated_at
1 1234 John Doe 01/02/1999 04/12/2002 2020-12-31 06:00:00
2 2345 Joanne Lee 04/19/2002 04/19/2002 2020-12-31 08:22:00
3 3456 Annabelle Jones 01/02/2001 04/21/2002 2020-12-31 05:00:00
update data frame based on data from another data frame using pandas python
try this, using outer merge
which gives both matching and non-matching records.
In [75]: df_m = df1.merge(df2, on="SKUCode", how='outer')
In [76]: mask = df_m['Status'].isnull()
In [77]: df_m.loc[~mask, 'SKUStatus'] = df_m.loc[~mask, 'Status']
In [78]: df_m[['SKUCode', "ListPrice", "SalePrice", "SKUStatus", "CostPrice"]].fillna(0.0)
output
SKUCode ListPrice SalePrice SKUStatus CostPrice
0 A 1798.0 1798.0 1.0 500.0
1 B 2997.0 2997.0 0.0 773.0
2 C 1798.0 1798.0 1.0 525.0
3 D 999.0 999.0 0.0 300.0
4 X 0.0 0.0 0.0 0.0
5 Y 0.0 0.0 0.0 0.0
R: add value from another data frame by finding same values in two data frames
use merge() from the base package
merge(df1, df2, by = 'Code', all.x=T, all.y=F)
How to merge two different size DataFrames in Pandas to update one dataframe depends on matching partial values in one column with another dataframe
You can use .update()
after setting index on time
on both data_1a
and data_1b
, as follows:
data_1a = data_1.set_index('time')
data_1a.update(data_2.set_index('time'))
data_out = data_1a.reset_index()
.update()
modifies in place using non-NA values from another DataFrame. Aligns on indices. Thus, when you set time
as index on both data_1a
and data_1b
, .update()
aligns on matching values in column time
to perform the update of data_1
by corresponding values of data_2
.
Data Setup:
a = {
'time':[1,2,3,4,5,6],
'column_1':[2,2,2,2,2,2],
'column_2':[3,3,3,3,3,3]
}
b = {
'time':[3,4,5],
'column_1':[0,0,0],
'column_2':[0,0,0]
}
data_1 = pd.DataFrame(a)
data_2 = pd.DataFrame(b)
Result:
print(data_out)
time column_1 column_2
0 1 2.0 3.0
1 2 2.0 3.0
2 3 0.0 0.0
3 4 0.0 0.0
4 5 0.0 0.0
5 6 2.0 3.0
Python Pandas - Vlookup - Update Existing Column in First Data Frame From Second Data Frame
Use Pandas merge
over df1
and df2
on columns ['key','info']
, then, use column key
as column name to join on and use only the keys from left dataframe how='left'
. Get the resulting column (info_y
) into the first dataframe.
df1['info'] = pd.merge(df1[['key','info']], df2[['key','info']], on='key', how='left')['info_y']
print(df1)
Output from df1
dataA dataB key info dataC
0 ABC 123 a1b infoA aaa
1 DEF 456 b57 NaN bbb
2 GHI 789 a22 infoC ccc
Related Topics
How to Draw a Line Across a Multiple-Figure Environment in R
Dynamically Creating Tabs with Plots in Shiny Without Re-Creating Existing Tabs
Multiplying All Elements of a Vector in R
Converting Excel Datetime Serial Number to R Datetime
Avoid Clipping of Points Along Axis in Ggplot
Simplest Way to Get Rbind to Ignore Column Names
Missing Legend with Ggplot2 and Geom_Line
Logical Operators (And, Or) with Na, True and False
How to Order the Months Chronologically in Ggplot2 Short of Writing the Months Out
Ggplot2 Multiple Sub Groups of a Bar Chart
Converting Two Columns of a Data Frame to a Named Vector
What's the Differences Between & and &&, | and || in R
Efficiently Sum Across Multiple Columns in R
How to Put a Geom_Sf Produced Map on Top of a Ggmap Produced Raster
Difference Between Passing Options in Aes() and Outside of It in Ggplot2