Python: Pandas Merge Multiple Dataframes

How to merge multiple dataframes

Below, is the most clean, comprehensible way of merging multiple dataframe if complex queries aren't involved.

Just simply merge with DATE as the index and merge using OUTER method (to get all the data).

import pandas as pd
from functools import reduce

df1 = pd.read_table('file1.csv', sep=',')
df2 = pd.read_table('file2.csv', sep=',')
df3 = pd.read_table('file3.csv', sep=',')

Now, basically load all the files you have as data frame into a list. And, then merge the files using merge or reduce function.

# compile the list of dataframes you want to merge
data_frames = [df1, df2, df3]

Note: you can add as many data-frames inside the above list. This is the good part about this method. No complex queries involved.

To keep the values that belong to the same date you need to merge it on the DATE

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['DATE'],
how='outer'), data_frames)

# if you want to fill the values that don't exist in the lines of merged dataframe simply fill with required strings as

df_merged = reduce(lambda left,right: pd.merge(left,right,on=['DATE'],
how='outer'), data_frames).fillna('void')
  • Now, the output will the values from the same date on the same lines.
  • You can fill the non existing data from different frames for different columns using fillna().

Then write the merged data to the csv file if desired.

pd.DataFrame.to_csv(df_merged, 'merged.txt', sep=',', na_rep='.', index=False)

This should give you

DATE VALUE1 VALUE2 VALUE3 ....

pandas three-way joining multiple dataframes on columns

Zero's answer is basically a reduce operation. If I had more than a handful of dataframes, I'd put them in a list like this (generated via list comprehensions or loops or whatnot):

dfs = [df0, df1, df2, ..., dfN]

Assuming they have a common column, like name in your example, I'd do the following:

import functools as ft
df_final = ft.reduce(lambda left, right: pd.merge(left, right, on='name'), dfs)

That way, your code should work with whatever number of dataframes you want to merge.

Merge multiple DataFrames Pandas

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

print(pd.concat(dfs, axis=1).reset_index())
# profile depth VAR1 VAR2 VAR3
# 0 profile_1 0.5 38.198002 NaN NaN
# 1 profile_1 0.6 38.198002 0.20440 NaN
# 2 profile_1 1.1 NaN 0.20442 NaN
# 3 profile_1 1.2 NaN 0.20446 15.188
# 4 profile_1 1.3 38.200001 NaN 15.182
# 5 profile_1 1.4 NaN NaN 15.182

pd.merge multiple dataframes with same column name on one specific column

You can use functools:

import functools
dfs = [df1[['user', 'books']], df2[['user', 'animal']], df3[['user', 'place']]]

df_final = functools.reduce(lambda left, right: pd.merge(left,right,on='user'), dfs)
print(df_final)

Print out:

     user  books  animal  place
0 alex 2 0 3
1 andrew 10 0 4
2 kelvin 15 2 5
3 mary 5 1 3

This has the advantage that you could easily expand on many more data frames if you wanted too.

Pandas merge multiple dataframes on one temporal index, with latest value from all others

You're in luck: pandas.merge_asof does exactly what you need!

We use the default direction='backward' argument:

A “backward” search selects the last row in the right DataFrame whose
‘on’ key is less than or equal to the left’s key.

Using your three example DataFrames:

import pandas as pd
from functools import reduce

# Convert all indexes to datetime
for df in [df1, df2, df3]:
df.index = pd.to_datetime(df.index)

# Perform as-of merges
res = reduce(lambda left, right:
pd.merge_asof(left, right, left_index=True, right_index=True),
[df1, df2, df3])

print(res)

target feature2 feature3
key keys keys
2022-04-15 20:20:20 a NaN c3
2022-04-15 20:20:21 b d2 d3
2022-04-15 20:20:22 c e2 d3

is There any methods to merge multiple dataframes of different templates

This is an option, you can merge the dataframes and then drop the useless columns from the total dataframe.

df_total = pd.concat([df1, df2, df3, df4], axis=0)
df_total.drop(['Value2', 'Value3'], axis=1)

Quickest way to merge two very large pandas dataframes using python

IIUC, you can use:

out = (df1.rename(columns={'Value': 'Demand'})
.assign(Time=df2['Value'], Demand_Time=df2['Value'] * df1['Value'])
.reset_index(drop=True))
print(out)

# Output
Origin Destination Demand Time Demand_Time
0 70 478 0.002779 135.974365 0.377873
1 70 479 0.001673 130.936752 0.219057
2 70 480 0.000427 111.191734 0.047479
3 70 481 0.001503 98.170746 0.147551
4 70 482 0.012150 88.257645 1.072330
5 70 483 0.004507 102.095566 0.460145
6 70 484 0.001871 103.585373 0.193808
7 70 485 0.006522 114.298431 0.745454
8 70 486 0.004786 97.331055 0.465826
9 70 487 0.026566 85.754776 2.278161


Related Topics



Leave a reply



Submit