Updating Column in One Dataframe with Value from Another Dataframe Based on Matching Values

Python Pandas update a dataframe value from another dataframe

You can using concat + drop_duplicates which updates the common rows and adds the new rows in df2

pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]:
Code Name Value
0 1 Company1 200
0 2 Company2 1000
2 3 Company3 400

Update due to below comments

df1.set_index(['Code', 'Name'], inplace=True)

df1.update(df2.set_index(['Code', 'Name']))

df1.reset_index(drop=True, inplace=True)

How to update column value of a data frame from another data frame matching 2 columns?

Here's a way to do it:

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')

Explanation:

  • modify df2 so it has Team ID, Group as its index and its only column is Result
  • use join to bring the new scores from df2 into a Result column in df1
  • use loc to update Score values for rows where Result is not null (i.e., rows for which an updated Score is available)
  • drop the Result column.

Full test code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'DEP ID':['001','001','002','002'],
'Team ID':['002','004','002','007'],
'Group':['A','A','A','A'],
'Score':[50,70,50,90]})
df2 = pd.DataFrame({
'DEP ID':['001','001','001'],
'Team ID':['002','003','004'],
'Group':['A','A','A'],
'Result':[80,60,70]})

print(df1)
print(df2)

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')
print(df1)

Output:

   index DEP ID Team ID Group  Score
0 0 001 002 A 80
1 1 001 004 A 70
2 2 002 002 A 80
3 3 002 007 A 90

UPDATE:

If Result column in df2 is instead named Score, as asked by OP in a comment, then the code can be adjusted slightly as follows:

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'], rsuffix='_NEW')
df1.loc[df1.Score_NEW.notna(), 'Score'] = df1.Score_NEW
df1 = df1.drop(columns='Score_NEW')

Replace column values based on another dataframe python pandas - better way?

Use the boolean mask from isin to filter the df and assign the desired row values from the rhs df:

In [27]:

df.loc[df.Name.isin(df1.Name), ['Nonprofit', 'Education']] = df1[['Nonprofit', 'Education']]
df
Out[27]:
Name Nonprofit Business Education
0 X 1 1 0
1 Y 1 1 1
2 Z 1 0 1
3 Y 1 1 1

[4 rows x 4 columns]

Replace column value of Dataframe based on a condition on another Dataframe

You can also try with map:

df_student['student_Id'] = (
df_student['student_Id'].map(df_updated_id.set_index('old_id')['new_id'])
.fillna(df_student['student_Id'])
)
print(df_student)

# Output
Name gender math score student_Id
0 John male 50 1234
1 Jay male 100 6788
2 sachin male 70 xyz
3 Geetha female 80 abcd
4 Amutha female 75 83ko
5 ganesh male 40 v432

Update

I believe the updated_id isn't unique, so I need to further pre-process the data.

In this case, maybe you could drop duplicates before considering the last value (keep='last') is the most recent for a same old_id:

sr = df_updated_id.drop_duplicates('old_id', keep='last') \
.set_index('old_id')['new_id']

df_student['student_Id'] = df_student['student_Id'].map(sr) \
.fillna(df_student['student_Id']
)

Note: this is exactly what the @BENY's answer does. As he creates a dict, only the last occurrence of an old_id is kept. However, if you want to keep the first value appears, his code doesn't work. With drop_duplicates, you can adjust the keep parameter.

Updating a column in a dataframe with values from the same column another dataframe

It looks like you want to replace Value from df with the corresponding Value in df2, if the value exists. I.e., assuming you had a CAT F that had a corresponding value of 36 in df, you would want that to be replaced by 99 (from df2).

Using merge:

df= df.merge(df2, on = 'CAT', how = 'left')
df['Value'] = df[['Value_x', 'Value_y']].apply(lambda x: np.where(df['Value_y'].isna(), df['Value_x'], df['Value_y'])).drop(columns = ['Value_y'])
df.drop(columns = ['Value_x', 'Value_y'])

Output:

  CAT  Value
0 A 12.0
1 B 34.0
2 C 22.0
3 D 43.0
4 E 21.0
5 F 99.0

Update a pandas dataframe with data from another dataframe

You can use combine_first with reindex:

df3 = df2.combine_first(df1).reindex(df1.index)
print (df3)
1 2 3 4
3234 Lorum Ipsum Foo Bar
8839 Lorum NaN Ipsum Foo
9911 Lorum Ipsum Bar Foo
2256 Lorum NaN Ipsum Bar

Or use your solution, but update working inplace, so if assign to variable it return None:

df1.update(df2)
print (df1)
1 2 3 4
3234 Lorum Ipsum Foo Bar
8839 Lorum NaN Ipsum Foo
9911 Lorum Ipsum Bar Foo
2256 Lorum NaN Ipsum Bar

print (df1.update(df2))
None

Updating a dataframe rows based on another dataframe rows

you can use update...like so..

df1.update(df2)

Updating value of one column in dataframe if ID match found in column of another dataframe

Please Try use outer merge and drop unrequired rows after you do your filters. Code below.

result=pd.merge(dataFrameOfLots, dataFrameFiltered, how='outer', on=['Customer', 'Stage', 'ProdType', 'Brand', 'ProdName', 'Size',
'Strength', 'Lot', 'PackedOn', 'Qty', 'Available'],suffixes=('_x', '')).fillna(0)
result=result.loc[:,~result.columns.str.endswith('_x')]#drop unwanted columns

or

result.drop(columns=['QtyInTransaction_x','IndexCol_x'], inplace=True)#drop unwanted columns

updating column in one dataframe with value from another dataframe based on matching values

you don't need z$color in the first place if its just place holder, you can replace NA later with 0

z$color<-y[match(z$letter, y$letter),2]

update a Dataframe from a different Dataframe based on a matching value

Your finaldf after merge has shape (33, 109) because it has columns with similar names but _x and _y appended to them. _x ones are from DF1 and _y ones are from DF2.

You need to run the below code after merge to remove the extra "_x" and "_y" columns for those 18 and copy the values from DF2 to DF1 where they matched on "ID":

remove_cols = []

for col in DF2.columns:
if col == 'ID':
continue
finaldf[col] = finaldf[col+'_y'].fillna(finaldf[col+'_x'])
remove_cols += [col+'_x', col+'_y']

finaldf.drop(remove_cols, axis=1, inplace=True)

For more information on why "_x" and "_y" columns appear in your merged dataframe, I would recommend you to check the official documentation of pd.DataFrame.merge method once.
"_x" and "_y" are suffixes that merge operation adds by default to distinguish between columns with similar names.

Alternatively:

pd.DataFrame.update is a method in pandas to achieve what you are trying to do.

Check it out here. But there is one caveat with using it, which is that if you have NaN values in DF2 that you would like to copy to DF1, then it won't do that. It will update only non-NA values:

Modify in place using non-NA values from another DataFrame.



Related Topics



Leave a reply



Submit