Replace Specific Values Based on Another Dataframe

Replace column values based on another dataframe python pandas - better way?

Use the boolean mask from isin to filter the df and assign the desired row values from the rhs df:

In [27]:

df.loc[df.Name.isin(df1.Name), ['Nonprofit', 'Education']] = df1[['Nonprofit', 'Education']]
df
Out[27]:
Name Nonprofit Business Education
0 X 1 1 0
1 Y 1 1 1
2 Z 1 0 1
3 Y 1 1 1

[4 rows x 4 columns]

replace column values in one dataframe by values of another dataframe

If you set the index to the 'Group' column on the other df then you can replace using map on your original df 'Group' column:

In [36]:
df['Group'] = df['Group'].map(df1.set_index('Group')['Hotel'])
df

Out[36]:
Date Group Family Bonus
0 2011-06-09 Jamel Laavin 456
1 2011-07-09 Frank Grendy 679
2 2011-09-10 Luxy Fantol 431
3 2011-11-02 Frank Gondow 569

Replace column value of Dataframe based on a condition on another Dataframe

You can also try with map:

df_student['student_Id'] = (
df_student['student_Id'].map(df_updated_id.set_index('old_id')['new_id'])
.fillna(df_student['student_Id'])
)
print(df_student)

# Output
Name gender math score student_Id
0 John male 50 1234
1 Jay male 100 6788
2 sachin male 70 xyz
3 Geetha female 80 abcd
4 Amutha female 75 83ko
5 ganesh male 40 v432

Update

I believe the updated_id isn't unique, so I need to further pre-process the data.

In this case, maybe you could drop duplicates before considering the last value (keep='last') is the most recent for a same old_id:

sr = df_updated_id.drop_duplicates('old_id', keep='last') \
.set_index('old_id')['new_id']

df_student['student_Id'] = df_student['student_Id'].map(sr) \
.fillna(df_student['student_Id']
)

Note: this is exactly what the @BENY's answer does. As he creates a dict, only the last occurrence of an old_id is kept. However, if you want to keep the first value appears, his code doesn't work. With drop_duplicates, you can adjust the keep parameter.

Replace values in one dataframe with values from another dataframe

You can use update after replacing 0 with np.nan and setting a common index between the two dataframes.

Be wary of two things:

  1. Use overwrite=False to only fill the null values
  2. update modifies inplace
common_index = ['Region','Product']
df_indexed = df.replace(0,np.nan).set_index(common_index)
df2_indexed = df2.set_index(common_index)

df_indexed.update(df2_indexed,overwrite=False)


print(df_indexed.reset_index())

Region Product Country Quantity Price
0 Africa ABC South Africa 500.0 1200.0
1 Africa DEF South Africa 200.0 400.0
2 Africa XYZ South Africa 110.0 300.0
3 Africa DEF Nigeria 150.0 450.0
4 Africa XYZ Nigeria 200.0 750.0
5 Asia XYZ Japan 100.0 500.0
6 Asia ABC Japan 200.0 500.0
7 Asia DEF Japan 120.0 300.0
8 Asia XYZ India 250.0 600.0
9 Asia ABC India 100.0 400.0
10 Asia DEF India 40.0 220.0

replacing values in a pandas dataframe with values from another dataframe based common columns

First separate the rows where you have NaN values out into a new dataframe called df3 and drop the rows where there are NaN values from df1.

Then do a left join based on the new dataframe.

df4 = pd.merge(df3,df2,how='left',on=['types','o_period'])

After that is done, append the rows from df4 back into df1.

Another way is to combine the 2 columns you want to lookup into a single column

df1["types_o"] = df1["types_o"].astype(str) + df1["o_period"].astype(str)

df2["types_o"] = df2["types_o"].astype(str) + df2["o_period"].astype(str)

Then you can do a look up on the missing values.

df1.types_o.replace('Nan', np.NaN, inplace=True)

df1.loc[df1['s_months'].isnull(),'s_months'] = df2['types_o'].map(df1.types_o)

df1.loc[df1['incidents'].isnull(),'incidents'] = df2['types_o'].map(df1.types_o)

You didn't paste any code or examples of your data which is easily reproducible so this is the best I can do.

Replace values in one column based on part of text in another dataframe in R

This seems to be a case for fuzzy_join with regex_left_join. After the regex_left_join, coalecse the columns together so that it will return the first non-NA element per each row

library(fuzzyjoin)
library(dplyr)
regex_left_join(df1, df2, by = 'Supplier') %>%
transmute(Supplier = coalesce(New_Supplier, Supplier.x), Value)

-output

 Supplier Value
1 AAA 100
2 Red 200
3 Red 300
4 DDD 400
5 Blue 200
6 Blue 100
7 Green 200
8 HHH 40
9 III 150
10 JJJ 70

Replace specific values based on another dataframe

You could use the join functionality of the data.table-package for this:

library(data.table)
setDT(DF1)
setDT(DF2)

DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]

which gives:

> DF1
date id sales cost city
1: 06/19/2016 1 9999 101 LON
2: 06/20/2016 1 150 102 MTL
3: 06/21/2016 1 151 104 MTL
4: 06/22/2016 1 152 107 MTL
5: 06/23/2016 1 155 99 MTL
6: 06/19/2016 2 84 55 NY
7: 06/20/2016 2 83 55 NY
8: 06/21/2016 2 80 56 NY
9: 06/22/2016 2 777 57 QC
10: 06/23/2016 2 555 58 QC

When you have many columns in both datasets, it is easier to use mget instead off typing all the column names. For the used data in the question it would look like:

DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]

When you want to construct a vector of columnnames that need to be added beforehand, you could do this as follows:

cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]

Pandas - Replacing Values by Looking Up in an Another Dataframe

Initialise a replacement dictionary and use df.replace to map those IDs to Names.

m = df2.set_index('Model ID')['Name'].to_dict()
v = df.filter(like='Linked Model')
df[v.columns] = v.replace(m)

df
ID Name Linked Model 1 Linked Model 2 Linked Model 3
0 100 A A A,B NaN
1 101 B A,B C Q
2 102 C NaN NaN NaN
3 103 D D NaN NaN
4 104 E D A A,B


Related Topics



Leave a reply



Submit