Pandas Merge - How to Avoid Duplicating Columns

Pandas Merge - How to avoid duplicating columns

You can work out the columns that are only in one DataFrame and use this to select a subset of columns in the merge.

cols_to_use = df2.columns.difference(df.columns)

Then perform the merge (note this is an index object but it has a handy tolist() method).

dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')

This will avoid any columns clashing in the merge.

Pandas merge without duplicating columns

Use DataFrame.combine_first with indices by postcode in both DataFrames and then if necessary add DataFrame.reindex for same order of columns like original df1:

print (df1)
postcode lat lon plus 32 more columns
0 M20 2.3 0.2 NaN NaN NaN NaN
1 LS1 NaN NaN NaN NaN NaN NaN
2 LS1 NaN NaN NaN NaN NaN NaN
3 LS2 NaN NaN NaN NaN NaN NaN
4 M21 2.4 0.3 NaN NaN NaN NaN

df1 = df1.set_index('postcode')
df2 = df2.set_index('postcode')

df3 = df1.combine_first(df2).reindex(df1.columns, axis=1)
print (df3)
lat lon plus 32 more columns
postcode
LS1 1.4 0.1 NaN NaN NaN NaN
LS1 1.4 0.1 NaN NaN NaN NaN
LS2 1.5 0.2 NaN NaN NaN NaN
M20 2.3 0.2 NaN NaN NaN NaN
M21 2.4 0.3 NaN NaN NaN NaN

How to merge Pandas dataframes without duplicating columns

Your problem is that you don't really want to just merge everything. You need to concat your first set of frames, then merge.

import pandas as pd
import numpy as np

base_frame.merge(pd.concat([frame1, frame2]), how='left')

# id supplier1_match0
#0 1 x
#1 2 2x
#2 3 NaN

Alternatively, you could define base_frame so that it has all of the relevant columns of the other frames and set id to be the index and use .update. This ensures base_frame remains the same size, while the above does not. Though data would be over-written if there are multiple non-null values for a given cell.

base_frame = pd.DataFrame({'id':[1,2,3]}).assign(supplier1_match0 = np.NaN).set_index('id')

for df in [frame1, frame2]:
base_frame.update(df.set_index('id'))

print(base_frame)

supplier1_match0
id
1 x
2 2x
3 NaN

Avoid duplicate columns while merging with pandas

I would pd.concat similar structured dataframes then merge the others like this:

df.merge(pd.concat([df1, df3]), on='date_time', how='left')\
.merge(df2, on='date_time', how='left')

Output:

             date_time  potato  carrot
0 2018-06-01 00:00:00 NaN NaN
1 2018-06-01 00:30:00 13.0 NaN
2 2018-06-01 01:00:00 21.0 NaN
3 2018-06-01 01:30:00 27.0 14.0

Per comments below:

df = pd.DataFrame({'date_time':['2018-06-01 00:00:00','2018-06-01 00:30:00','2018-06-01 01:00:00','2018-06-01 01:30:00']})

# Dataframes to merge to reference dataframe
df1 = pd.DataFrame({'date_time':['2018-06-01 00:30:00','2018-06-01 01:00:00'],
'potato':[13,21]})

df2 = pd.DataFrame({'date_time':['2018-06-01 01:30:00','2018-06-01 02:00:00','2018-06-01 02:30:00'],
'carrot':[14,8,32]})

df3 = pd.DataFrame({'date_time':['2018-06-01 01:30:00', '2018-06-01 02:00:00'],'potato':[27,31], 'zucchini':[11,1]})

df.merge(pd.concat([df1, df3]), on='date_time', how='left').merge(df2, on='date_time', how='left')

Output:

             date_time  potato  zucchini  carrot
0 2018-06-01 00:00:00 NaN NaN NaN
1 2018-06-01 00:30:00 13.0 NaN NaN
2 2018-06-01 01:00:00 21.0 NaN NaN
3 2018-06-01 01:30:00 27.0 11.0 14.0

Python Pandas merge dataframes without duplicating columns

FYI, you do not need the reduce function, you can simply use:

df_all = df1.merge(df2)

It is duplicating columns because you are merging on 'Name'. If all your columns are the same, you can drop the on='Name' argument and it will merge on all common columns instead of duplicating them.

Alternatively, you can merge only the non-duplicate columns from df2:

df_all = df1.merge(df2[['Name','Age']])

Merging multiple data frames causing duplicate column names

You can do

s = pd.concat([x.set_index('key') for x in df_list],axis = 1,keys=range(len(df_list)))
s.columns = s.columns.map('{0[1]}_{0[0]}'.format)
s = s.reset_index()
s
Out[236]:
key value_0 value_1 value_2 value_3
0 A -1.957968 NaN -0.852135 -0.976960
1 B 1.545932 -0.276838 NaN 0.197615
2 C -2.149727 NaN -0.364382 0.349993
3 D 0.524990 -0.476655 NaN NaN
4 E NaN -2.135870 0.798782 NaN
5 F NaN 1.456544 -0.255705 0.447279


Related Topics



Leave a reply



Submit