Python Pandas update a dataframe value from another dataframe
You can using concat
+ drop_duplicates
which updates the common rows and adds the new rows in df2
pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]:
Code Name Value
0 1 Company1 200
0 2 Company2 1000
2 3 Company3 400
Update due to below comments
df1.set_index(['Code', 'Name'], inplace=True)
df1.update(df2.set_index(['Code', 'Name']))
df1.reset_index(drop=True, inplace=True)
How to update column value of a data frame from another data frame matching 2 columns?
Here's a way to do it:
df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')
Explanation:
- modify df2 so it has
Team ID, Group
as its index and its only column isResult
- use
join
to bring the new scores from df2 into aResult
column in df1 - use
loc
to updateScore
values for rows whereResult
is not null (i.e., rows for which an updatedScore
is available) - drop the
Result
column.
Full test code:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'DEP ID':['001','001','002','002'],
'Team ID':['002','004','002','007'],
'Group':['A','A','A','A'],
'Score':[50,70,50,90]})
df2 = pd.DataFrame({
'DEP ID':['001','001','001'],
'Team ID':['002','003','004'],
'Group':['A','A','A'],
'Result':[80,60,70]})
print(df1)
print(df2)
df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')
print(df1)
Output:
index DEP ID Team ID Group Score
0 0 001 002 A 80
1 1 001 004 A 70
2 2 002 002 A 80
3 3 002 007 A 90
UPDATE:
If Result
column in df2 is instead named Score
, as asked by OP in a comment, then the code can be adjusted slightly as follows:
df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'], rsuffix='_NEW')
df1.loc[df1.Score_NEW.notna(), 'Score'] = df1.Score_NEW
df1 = df1.drop(columns='Score_NEW')
Replace column values based on another dataframe python pandas - better way?
Use the boolean mask from isin
to filter the df and assign the desired row values from the rhs df:
In [27]:
df.loc[df.Name.isin(df1.Name), ['Nonprofit', 'Education']] = df1[['Nonprofit', 'Education']]
df
Out[27]:
Name Nonprofit Business Education
0 X 1 1 0
1 Y 1 1 1
2 Z 1 0 1
3 Y 1 1 1
[4 rows x 4 columns]
Replace column value of Dataframe based on a condition on another Dataframe
You can also try with map
:
df_student['student_Id'] = (
df_student['student_Id'].map(df_updated_id.set_index('old_id')['new_id'])
.fillna(df_student['student_Id'])
)
print(df_student)
# Output
Name gender math score student_Id
0 John male 50 1234
1 Jay male 100 6788
2 sachin male 70 xyz
3 Geetha female 80 abcd
4 Amutha female 75 83ko
5 ganesh male 40 v432
Update
I believe the updated_id isn't unique, so I need to further pre-process the data.
In this case, maybe you could drop duplicates before considering the last value (keep='last'
) is the most recent for a same old_id
:
sr = df_updated_id.drop_duplicates('old_id', keep='last') \
.set_index('old_id')['new_id']
df_student['student_Id'] = df_student['student_Id'].map(sr) \
.fillna(df_student['student_Id']
)
Note: this is exactly what the @BENY's answer does. As he creates a dict, only the last occurrence of an old_id
is kept. However, if you want to keep the first value appears, his code doesn't work. With drop_duplicates
, you can adjust the keep
parameter.
Updating a column in a dataframe with values from the same column another dataframe
It looks like you want to replace Value
from df
with the corresponding Value
in df2
, if the value exists. I.e., assuming you had a CAT
F that had a corresponding value of 36
in df
, you would want that to be replaced by 99 (from df2
).
Using merge
:
df= df.merge(df2, on = 'CAT', how = 'left')
df['Value'] = df[['Value_x', 'Value_y']].apply(lambda x: np.where(df['Value_y'].isna(), df['Value_x'], df['Value_y'])).drop(columns = ['Value_y'])
df.drop(columns = ['Value_x', 'Value_y'])
Output:
CAT Value
0 A 12.0
1 B 34.0
2 C 22.0
3 D 43.0
4 E 21.0
5 F 99.0
Update a pandas dataframe with data from another dataframe
You can use combine_first
with reindex
:
df3 = df2.combine_first(df1).reindex(df1.index)
print (df3)
1 2 3 4
3234 Lorum Ipsum Foo Bar
8839 Lorum NaN Ipsum Foo
9911 Lorum Ipsum Bar Foo
2256 Lorum NaN Ipsum Bar
Or use your solution, but update
working inplace, so if assign to variable it return None
:
df1.update(df2)
print (df1)
1 2 3 4
3234 Lorum Ipsum Foo Bar
8839 Lorum NaN Ipsum Foo
9911 Lorum Ipsum Bar Foo
2256 Lorum NaN Ipsum Bar
print (df1.update(df2))
None
Updating a dataframe rows based on another dataframe rows
you can use update...like so..
df1.update(df2)
Updating value of one column in dataframe if ID match found in column of another dataframe
Please Try use outer merge and drop unrequired rows after you do your filters. Code below.
result=pd.merge(dataFrameOfLots, dataFrameFiltered, how='outer', on=['Customer', 'Stage', 'ProdType', 'Brand', 'ProdName', 'Size',
'Strength', 'Lot', 'PackedOn', 'Qty', 'Available'],suffixes=('_x', '')).fillna(0)
result=result.loc[:,~result.columns.str.endswith('_x')]#drop unwanted columns
or
result.drop(columns=['QtyInTransaction_x','IndexCol_x'], inplace=True)#drop unwanted columns
updating column in one dataframe with value from another dataframe based on matching values
you don't need z$color
in the first place if its just place holder, you can replace NA
later with 0
z$color<-y[match(z$letter, y$letter),2]
update a Dataframe from a different Dataframe based on a matching value
Your finaldf
after merge has shape (33, 109) because it has columns with similar names but _x
and _y
appended to them. _x
ones are from DF1 and _y
ones are from DF2.
You need to run the below code after merge to remove the extra "_x" and "_y" columns for those 18 and copy the values from DF2 to DF1 where they matched on "ID":
remove_cols = []
for col in DF2.columns:
if col == 'ID':
continue
finaldf[col] = finaldf[col+'_y'].fillna(finaldf[col+'_x'])
remove_cols += [col+'_x', col+'_y']
finaldf.drop(remove_cols, axis=1, inplace=True)
For more information on why "_x" and "_y" columns appear in your merged dataframe, I would recommend you to check the official documentation of pd.DataFrame.merge
method once.
"_x" and "_y" are suffixes that merge operation adds by default to distinguish between columns with similar names.
Alternatively:
pd.DataFrame.update
is a method in pandas to achieve what you are trying to do.
Check it out here. But there is one caveat with using it, which is that if you have NaN values in DF2 that you would like to copy to DF1, then it won't do that. It will update only non-NA
values:
Modify in place using non-NA values from another DataFrame.
Related Topics
Display an Axis Value in Millions in Ggplot
Reshape Wide Format, to Multi-Column Long Format
Transforming Dataset into Value Matrix
Lookup Values Corresponding to the Closest Date
Filter Based on Number of Distinct Values Per Group
How to Change the Default Font Size in Ggplot2
How to Apply Function Over Each Matrix Element's Indices
Plot a Legend and Well-Spaced Universal Y-Axis and Main Titles in Grid.Arrange
How to Split the Main Title of a Plot in 2 or More Lines
R "Stats" Citation for a Scientific Paper
Really Fast Word Ngram Vectorization in R
Passing Arguments to Iterated Function Through Apply
How to Create a Time-Spiral Graph Using R
What Is the Correct/Standard Way to Check If Difference Is Smaller Than MAChine Precision