How to Combine Two Dataframes

How do I combine two dataframes?

DEPRECATED: DataFrame.append and Series.append were deprecated in v1.4.0.


Use append:

df_merged = df1.append(df2, ignore_index=True)

And to keep their indexes, set ignore_index=False.

How to merge multiple dataframes

Below, is the most clean, comprehensible way of merging multiple dataframe if complex queries aren't involved.

Just simply merge with DATE as the index and merge using OUTER method (to get all the data).

import pandas as pd
from functools import reduce

df1 = pd.read_table('file1.csv', sep=',')
df2 = pd.read_table('file2.csv', sep=',')
df3 = pd.read_table('file3.csv', sep=',')

Now, basically load all the files you have as data frame into a list. And, then merge the files using merge or reduce function.

# compile the list of dataframes you want to merge
data_frames = [df1, df2, df3]

Note: you can add as many data-frames inside the above list. This is the good part about this method. No complex queries involved.

To keep the values that belong to the same date you need to merge it on the DATE

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['DATE'],
how='outer'), data_frames)

# if you want to fill the values that don't exist in the lines of merged dataframe simply fill with required strings as

df_merged = reduce(lambda left,right: pd.merge(left,right,on=['DATE'],
how='outer'), data_frames).fillna('void')
  • Now, the output will the values from the same date on the same lines.
  • You can fill the non existing data from different frames for different columns using fillna().

Then write the merged data to the csv file if desired.

pd.DataFrame.to_csv(df_merged, 'merged.txt', sep=',', na_rep='.', index=False)

This should give you

DATE VALUE1 VALUE2 VALUE3 ....

Merge two dataframes by index

Use merge, which is an inner join by default:

pd.merge(df1, df2, left_index=True, right_index=True)

Or join, which is a left join by default:

df1.join(df2)

Or concat), which is an outer join by default:

pd.concat([df1, df2], axis=1)

Samples:

df1 = pd.DataFrame({'a':range(6),
'b':[5,3,6,9,2,4]}, index=list('abcdef'))

print (df1)
a b
a 0 5
b 1 3
c 2 6
d 3 9
e 4 2
f 5 4

df2 = pd.DataFrame({'c':range(4),
'd':[10,20,30, 40]}, index=list('abhi'))

print (df2)
c d
a 0 10
b 1 20
h 2 30
i 3 40


# Default inner join
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
print (df3)
a b c d
a 0 5 0 10
b 1 3 1 20

# Default left join
df4 = df1.join(df2)
print (df4)
a b c d
a 0 5 0.0 10.0
b 1 3 1.0 20.0
c 2 6 NaN NaN
d 3 9 NaN NaN
e 4 2 NaN NaN
f 5 4 NaN NaN

# Default outer join
df5 = pd.concat([df1, df2], axis=1)
print (df5)
a b c d
a 0.0 5.0 0.0 10.0
b 1.0 3.0 1.0 20.0
c 2.0 6.0 NaN NaN
d 3.0 9.0 NaN NaN
e 4.0 2.0 NaN NaN
f 5.0 4.0 NaN NaN
h NaN NaN 2.0 30.0
i NaN NaN 3.0 40.0

How can I combine two dataframes based on a column of lists in Pandas

You can explode "B" into separate rows, then merge on "B" and drop duplicates.

Big thanks to Asish M. in the comments for pointing out a potential bug with the ordering.

(df.explode('B')
.merge(df2, on='B', how='left')
.dropna(subset=['C'])
.drop_duplicates('A'))

A B C
0 xy 1 pq
2 yx 4 pr
5 zy 3 rs
7 zz 6 sp

Ideally, the following should have worked:

df.explode('B').merge(df2).drop_duplicates('A')

However, pandas (as of writing, version 1.2dev) does not preserve the ordering of the left keys on a merge which is a bug, see GH18776.

In the meantime, we can use the workaround of a left merge as shown above.



Related Topics



Leave a reply



Submit