Replace Column Values Based on Another Dataframe Python Pandas - Better Way

Replace column values based on another dataframe python pandas - better way?

Use the boolean mask from isin to filter the df and assign the desired row values from the rhs df:

In [27]:

df.loc[df.Name.isin(df1.Name), ['Nonprofit', 'Education']] = df1[['Nonprofit', 'Education']]
df
Out[27]:
  Name  Nonprofit  Business  Education
0    X          1         1          0
1    Y          1         1          1
2    Z          1         0          1
3    Y          1         1          1

[4 rows x 4 columns]

replace column values in one dataframe by values of another dataframe

If you set the index to the 'Group' column on the other df then you can replace using map on your original df 'Group' column:

In [36]:
df['Group'] = df['Group'].map(df1.set_index('Group')['Hotel'])
df

Out[36]:
         Date  Group  Family  Bonus
0  2011-06-09  Jamel  Laavin    456
1  2011-07-09  Frank  Grendy    679
2  2011-09-10   Luxy  Fantol    431
3  2011-11-02  Frank  Gondow    569

replacing values in a pandas dataframe with values from another dataframe based common columns

First separate the rows where you have NaN values out into a new dataframe called df3 and drop the rows where there are NaN values from df1.

Then do a left join based on the new dataframe.

df4 = pd.merge(df3,df2,how='left',on=['types','o_period'])

After that is done, append the rows from df4 back into df1.

Another way is to combine the 2 columns you want to lookup into a single column

df1["types_o"] = df1["types_o"].astype(str) + df1["o_period"].astype(str)

df2["types_o"] = df2["types_o"].astype(str) + df2["o_period"].astype(str)

Then you can do a look up on the missing values.

df1.types_o.replace('Nan', np.NaN, inplace=True)

df1.loc[df1['s_months'].isnull(),'s_months'] = df2['types_o'].map(df1.types_o)

df1.loc[df1['incidents'].isnull(),'incidents'] = df2['types_o'].map(df1.types_o)

You didn't paste any code or examples of your data which is easily reproducible so this is the best I can do.

Replace values in one dataframe with values from another dataframe

You can use update after replacing 0 with np.nan and setting a common index between the two dataframes.

Be wary of two things:

Use overwrite=False to only fill the null values
update modifies inplace

common_index = ['Region','Product']
df_indexed = df.replace(0,np.nan).set_index(common_index)
df2_indexed = df2.set_index(common_index)

df_indexed.update(df2_indexed,overwrite=False)

print(df_indexed.reset_index())

    Region Product       Country  Quantity   Price
0   Africa     ABC  South Africa     500.0  1200.0
1   Africa     DEF  South Africa     200.0   400.0
2   Africa     XYZ  South Africa     110.0   300.0
3   Africa     DEF       Nigeria     150.0   450.0
4   Africa     XYZ       Nigeria     200.0   750.0
5     Asia     XYZ         Japan     100.0   500.0
6     Asia     ABC         Japan     200.0   500.0
7     Asia     DEF         Japan     120.0   300.0
8     Asia     XYZ         India     250.0   600.0
9     Asia     ABC         India     100.0   400.0
10    Asia     DEF         India      40.0   220.0

Python Pandas update a dataframe value from another dataframe

You can using concat + drop_duplicates which updates the common rows and adds the new rows in df2

pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]: 
   Code      Name  Value
0     1  Company1    200
0     2  Company2   1000
2     3  Company3    400

Update due to below comments

df1.set_index(['Code', 'Name'], inplace=True)

df1.update(df2.set_index(['Code', 'Name']))

df1.reset_index(drop=True, inplace=True)

Efficiently replace values from a column to another column Pandas DataFrame

Using np.where is faster. Using a similar pattern as you used with replace:

df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])
df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])

However, using a nested np.where is slightly faster:

df['col1'] = np.where(df['col1'] == 0, 
                      np.where(df['col2'] == 0, df['col3'], df['col2']),
                      df['col1'])

Timings

Using the following setup to produce a larger sample DataFrame and timing functions:

df = pd.concat([df]*10**4, ignore_index=True)

def root_nested(df):
    df['col1'] = np.where(df['col1'] == 0, np.where(df['col2'] == 0, df['col3'], df['col2']), df['col1'])
    return df

def root_split(df):
    df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])
    df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])
    return df

def pir2(df):
    df['col1'] = df.where(df.ne(0), np.nan).bfill(axis=1).col1.fillna(0)
    return df

def pir2_2(df):
    slc = (df.values != 0).argmax(axis=1)
    return df.values[np.arange(slc.shape[0]), slc]

def andrew(df):
    df.col1[df.col1 == 0] = df.col2
    df.col1[df.col1 == 0] = df.col3
    return df

def pablo(df):
    df['col1'] = df['col1'].replace(0,df['col2'])
    df['col1'] = df['col1'].replace(0,df['col3'])
    return df

I get the following timings:

%timeit root_nested(df.copy())
100 loops, best of 3: 2.25 ms per loop

%timeit root_split(df.copy())
100 loops, best of 3: 2.62 ms per loop

%timeit pir2(df.copy())
100 loops, best of 3: 6.25 ms per loop

%timeit pir2_2(df.copy())
1 loop, best of 3: 2.4 ms per loop

%timeit andrew(df.copy())
100 loops, best of 3: 8.55 ms per loop

I tried timing your method, but it's been running for multiple minutes without completing. As a comparison, timing your method on just the 6 row example DataFrame (not the much larger one tested above) took 12.8 ms.

Pandas replace columns by merging another dataframe

here is one way to do it, by making use of pd.update. However, it requires to set the index on the id, so it can match the two df

df.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
df.update(df2)
df['A'] = df['A'].astype(int)  # value by default was of type float
df.reset_index()

    id  A   B
0   1   3   5
1   1   3   6
2   2   4   7
3   2   4   8

Replace values in dataframe where updated versions are in another dataframe

Use:

df1['Value'] = df1['Identity'].map(df2.set_index('Identity')['Value']).fillna(df1['Value'])

Or try reset_index with reindex and set_index with fillna:

df1['Value'] = df2.set_index('Identity').reindex(df1['Identity'])
                  .reset_index(drop=True)['Value'].fillna(df1['Value'])

>>> df1
   Identity  Value Notes
0         3    1.0     a
1         4    0.0     b
2         5    3.0     c
3         6    4.0     d
4         7    5.0     e
5         8  128.0     f
6         9    7.0     g
>>>

This fills missing rows in df2 with NaN and fills the NaNs with df1 values.

Replace Column Values Based on Another Dataframe Python Pandas - Better Way