Replace column values based on another dataframe python pandas - better way?
Use the boolean mask from isin
to filter the df and assign the desired row values from the rhs df:
In [27]:
df.loc[df.Name.isin(df1.Name), ['Nonprofit', 'Education']] = df1[['Nonprofit', 'Education']]
df
Out[27]:
Name Nonprofit Business Education
0 X 1 1 0
1 Y 1 1 1
2 Z 1 0 1
3 Y 1 1 1
[4 rows x 4 columns]
replace column values in one dataframe by values of another dataframe
If you set the index to the 'Group' column on the other df then you can replace using map
on your original df 'Group' column:
In [36]:
df['Group'] = df['Group'].map(df1.set_index('Group')['Hotel'])
df
Out[36]:
Date Group Family Bonus
0 2011-06-09 Jamel Laavin 456
1 2011-07-09 Frank Grendy 679
2 2011-09-10 Luxy Fantol 431
3 2011-11-02 Frank Gondow 569
replacing values in a pandas dataframe with values from another dataframe based common columns
First separate the rows where you have NaN values out into a new dataframe called df3 and drop the rows where there are NaN values from df1.
Then do a left join based on the new dataframe.
df4 = pd.merge(df3,df2,how='left',on=['types','o_period'])
After that is done, append the rows from df4 back into df1.
Another way is to combine the 2 columns you want to lookup into a single column
df1["types_o"] = df1["types_o"].astype(str) + df1["o_period"].astype(str)
df2["types_o"] = df2["types_o"].astype(str) + df2["o_period"].astype(str)
Then you can do a look up on the missing values.
df1.types_o.replace('Nan', np.NaN, inplace=True)
df1.loc[df1['s_months'].isnull(),'s_months'] = df2['types_o'].map(df1.types_o)
df1.loc[df1['incidents'].isnull(),'incidents'] = df2['types_o'].map(df1.types_o)
You didn't paste any code or examples of your data which is easily reproducible so this is the best I can do.
Replace values in one dataframe with values from another dataframe
You can use update
after replacing 0
with np.nan
and setting a common index
between the two dataframes.
Be wary of two things:
- Use
overwrite=False
to only fill the null values update
modifiesinplace
common_index = ['Region','Product']
df_indexed = df.replace(0,np.nan).set_index(common_index)
df2_indexed = df2.set_index(common_index)
df_indexed.update(df2_indexed,overwrite=False)
print(df_indexed.reset_index())
Region Product Country Quantity Price
0 Africa ABC South Africa 500.0 1200.0
1 Africa DEF South Africa 200.0 400.0
2 Africa XYZ South Africa 110.0 300.0
3 Africa DEF Nigeria 150.0 450.0
4 Africa XYZ Nigeria 200.0 750.0
5 Asia XYZ Japan 100.0 500.0
6 Asia ABC Japan 200.0 500.0
7 Asia DEF Japan 120.0 300.0
8 Asia XYZ India 250.0 600.0
9 Asia ABC India 100.0 400.0
10 Asia DEF India 40.0 220.0
Python Pandas update a dataframe value from another dataframe
You can using concat
+ drop_duplicates
which updates the common rows and adds the new rows in df2
pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]:
Code Name Value
0 1 Company1 200
0 2 Company2 1000
2 3 Company3 400
Update due to below comments
df1.set_index(['Code', 'Name'], inplace=True)
df1.update(df2.set_index(['Code', 'Name']))
df1.reset_index(drop=True, inplace=True)
Efficiently replace values from a column to another column Pandas DataFrame
Using np.where
is faster. Using a similar pattern as you used with replace
:
df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])
df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])
However, using a nested np.where
is slightly faster:
df['col1'] = np.where(df['col1'] == 0,
np.where(df['col2'] == 0, df['col3'], df['col2']),
df['col1'])
Timings
Using the following setup to produce a larger sample DataFrame and timing functions:
df = pd.concat([df]*10**4, ignore_index=True)
def root_nested(df):
df['col1'] = np.where(df['col1'] == 0, np.where(df['col2'] == 0, df['col3'], df['col2']), df['col1'])
return df
def root_split(df):
df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])
df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])
return df
def pir2(df):
df['col1'] = df.where(df.ne(0), np.nan).bfill(axis=1).col1.fillna(0)
return df
def pir2_2(df):
slc = (df.values != 0).argmax(axis=1)
return df.values[np.arange(slc.shape[0]), slc]
def andrew(df):
df.col1[df.col1 == 0] = df.col2
df.col1[df.col1 == 0] = df.col3
return df
def pablo(df):
df['col1'] = df['col1'].replace(0,df['col2'])
df['col1'] = df['col1'].replace(0,df['col3'])
return df
I get the following timings:
%timeit root_nested(df.copy())
100 loops, best of 3: 2.25 ms per loop
%timeit root_split(df.copy())
100 loops, best of 3: 2.62 ms per loop
%timeit pir2(df.copy())
100 loops, best of 3: 6.25 ms per loop
%timeit pir2_2(df.copy())
1 loop, best of 3: 2.4 ms per loop
%timeit andrew(df.copy())
100 loops, best of 3: 8.55 ms per loop
I tried timing your method, but it's been running for multiple minutes without completing. As a comparison, timing your method on just the 6 row example DataFrame (not the much larger one tested above) took 12.8 ms.
Pandas replace columns by merging another dataframe
here is one way to do it, by making use of pd.update. However, it requires to set the index on the id, so it can match the two df
df.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
df.update(df2)
df['A'] = df['A'].astype(int) # value by default was of type float
df.reset_index()
id A B
0 1 3 5
1 1 3 6
2 2 4 7
3 2 4 8
Replace values in dataframe where updated versions are in another dataframe
Use:
df1['Value'] = df1['Identity'].map(df2.set_index('Identity')['Value']).fillna(df1['Value'])
Or try reset_index
with reindex
and set_index
with fillna
:
df1['Value'] = df2.set_index('Identity').reindex(df1['Identity'])
.reset_index(drop=True)['Value'].fillna(df1['Value'])
>>> df1
Identity Value Notes
0 3 1.0 a
1 4 0.0 b
2 5 3.0 c
3 6 4.0 d
4 7 5.0 e
5 8 128.0 f
6 9 7.0 g
>>>
This fills missing rows in df2
with NaN
and fills the NaN
s with df1
values.
Related Topics
Keep Persistent Variables in Memory Between Runs of Python Script
Iso to Datetime Object: 'Z' Is a Bad Directive
How to Left Align a Fixed Width String
How to Draw Axis in the Middle of the Figure
How to Assign the Value of a Variable Using Eval in Python
Python Slice How-To, I Know the Python Slice But How to Use Built-In Slice Object for It
Matrix Multiplication in Pure Python
Error: Pg_Config Executable Not Found When Installing Psycopg2 on Alpine in Docker
Scaling of Tkinter Gui in 4K (3840*2160) Resolution
How to Replace (Or Strip) an Extension from a Filename in Python
Python How to Read N Number of Lines at a Time
Why Is the Empty Dictionary a Dangerous Default Value in Python
How to Merge Two Lists into a Single List
Variable Defined with With-Statement Available Outside of With-Block