Replace column values based on another dataframe python pandas - better way?
Use the boolean mask from isin
to filter the df and assign the desired row values from the rhs df:
In [27]:
df.loc[df.Name.isin(df1.Name), ['Nonprofit', 'Education']] = df1[['Nonprofit', 'Education']]
df
Out[27]:
Name Nonprofit Business Education
0 X 1 1 0
1 Y 1 1 1
2 Z 1 0 1
3 Y 1 1 1
[4 rows x 4 columns]
replace column values in one dataframe by values of another dataframe
If you set the index to the 'Group' column on the other df then you can replace using map
on your original df 'Group' column:
In [36]:
df['Group'] = df['Group'].map(df1.set_index('Group')['Hotel'])
df
Out[36]:
Date Group Family Bonus
0 2011-06-09 Jamel Laavin 456
1 2011-07-09 Frank Grendy 679
2 2011-09-10 Luxy Fantol 431
3 2011-11-02 Frank Gondow 569
Replace column value of Dataframe based on a condition on another Dataframe
You can also try with map
:
df_student['student_Id'] = (
df_student['student_Id'].map(df_updated_id.set_index('old_id')['new_id'])
.fillna(df_student['student_Id'])
)
print(df_student)
# Output
Name gender math score student_Id
0 John male 50 1234
1 Jay male 100 6788
2 sachin male 70 xyz
3 Geetha female 80 abcd
4 Amutha female 75 83ko
5 ganesh male 40 v432
Update
I believe the updated_id isn't unique, so I need to further pre-process the data.
In this case, maybe you could drop duplicates before considering the last value (keep='last'
) is the most recent for a same old_id
:
sr = df_updated_id.drop_duplicates('old_id', keep='last') \
.set_index('old_id')['new_id']
df_student['student_Id'] = df_student['student_Id'].map(sr) \
.fillna(df_student['student_Id']
)
Note: this is exactly what the @BENY's answer does. As he creates a dict, only the last occurrence of an old_id
is kept. However, if you want to keep the first value appears, his code doesn't work. With drop_duplicates
, you can adjust the keep
parameter.
Replace values in one dataframe with values from another dataframe
You can use update
after replacing 0
with np.nan
and setting a common index
between the two dataframes.
Be wary of two things:
- Use
overwrite=False
to only fill the null values update
modifiesinplace
common_index = ['Region','Product']
df_indexed = df.replace(0,np.nan).set_index(common_index)
df2_indexed = df2.set_index(common_index)
df_indexed.update(df2_indexed,overwrite=False)
print(df_indexed.reset_index())
Region Product Country Quantity Price
0 Africa ABC South Africa 500.0 1200.0
1 Africa DEF South Africa 200.0 400.0
2 Africa XYZ South Africa 110.0 300.0
3 Africa DEF Nigeria 150.0 450.0
4 Africa XYZ Nigeria 200.0 750.0
5 Asia XYZ Japan 100.0 500.0
6 Asia ABC Japan 200.0 500.0
7 Asia DEF Japan 120.0 300.0
8 Asia XYZ India 250.0 600.0
9 Asia ABC India 100.0 400.0
10 Asia DEF India 40.0 220.0
replacing values in a pandas dataframe with values from another dataframe based common columns
First separate the rows where you have NaN values out into a new dataframe called df3 and drop the rows where there are NaN values from df1.
Then do a left join based on the new dataframe.
df4 = pd.merge(df3,df2,how='left',on=['types','o_period'])
After that is done, append the rows from df4 back into df1.
Another way is to combine the 2 columns you want to lookup into a single column
df1["types_o"] = df1["types_o"].astype(str) + df1["o_period"].astype(str)
df2["types_o"] = df2["types_o"].astype(str) + df2["o_period"].astype(str)
Then you can do a look up on the missing values.
df1.types_o.replace('Nan', np.NaN, inplace=True)
df1.loc[df1['s_months'].isnull(),'s_months'] = df2['types_o'].map(df1.types_o)
df1.loc[df1['incidents'].isnull(),'incidents'] = df2['types_o'].map(df1.types_o)
You didn't paste any code or examples of your data which is easily reproducible so this is the best I can do.
Replace values in one column based on part of text in another dataframe in R
This seems to be a case for fuzzy_join
with regex_left_join
. After the regex_left_join
, coalecse
the columns together so that it will return the first non-NA element per each row
library(fuzzyjoin)
library(dplyr)
regex_left_join(df1, df2, by = 'Supplier') %>%
transmute(Supplier = coalesce(New_Supplier, Supplier.x), Value)
-output
Supplier Value
1 AAA 100
2 Red 200
3 Red 300
4 DDD 400
5 Blue 200
6 Blue 100
7 Green 200
8 HHH 40
9 III 150
10 JJJ 70
Replace specific values based on another dataframe
You could use the join functionality of the data.table-package for this:
library(data.table)
setDT(DF1)
setDT(DF2)
DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
which gives:
> DF1
date id sales cost city
1: 06/19/2016 1 9999 101 LON
2: 06/20/2016 1 150 102 MTL
3: 06/21/2016 1 151 104 MTL
4: 06/22/2016 1 152 107 MTL
5: 06/23/2016 1 155 99 MTL
6: 06/19/2016 2 84 55 NY
7: 06/20/2016 2 83 55 NY
8: 06/21/2016 2 80 56 NY
9: 06/22/2016 2 777 57 QC
10: 06/23/2016 2 555 58 QC
When you have many columns in both datasets, it is easier to use mget
instead off typing all the column names. For the used data in the question it would look like:
DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
When you want to construct a vector of columnnames that need to be added beforehand, you could do this as follows:
cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]
Pandas - Replacing Values by Looking Up in an Another Dataframe
Initialise a replacement dictionary and use df.replace
to map those IDs to Names.
m = df2.set_index('Model ID')['Name'].to_dict()
v = df.filter(like='Linked Model')
df[v.columns] = v.replace(m)
df
ID Name Linked Model 1 Linked Model 2 Linked Model 3
0 100 A A A,B NaN
1 101 B A,B C Q
2 102 C NaN NaN NaN
3 103 D D NaN NaN
4 104 E D A A,B
Related Topics
What Are the Differences Between Concatenating Strings with Cat() and Paste()
How to Handle Vectors Without Knowing the Type in Rcpp
How to Train a Ml Model in Sparklyr and Predict New Values on Another Dataframe
Can't Run Rcpp Function in Foreach - "Null Value Passed as Symbol Address"
Export Fitted Regression Splines (Constructed by 'Bs' or 'Ns') as Piecewise Polynomials
Combinations of Multiple Vectors in R
Find Matching Strings Between Two Vectors in R
Show Element Values in Barplot
How to Add Only Missing Dates in Dataframe
Sum of Two Columns of Data Frame with Na Values
Why Is Subsetting on a "Logical" Type Slower Than Subsetting on "Numeric" Type
Increase Space Between Bars in Ggplot
Element-Wise Concatenation of String Vectors
Remove Text After Final Period in String
Aggregating All Unique Values of Each Column of Data Frame
How to Put Values on a Boxplot for Median, 1St Quartile and Last Quartile