Concatenate a List of Pandas Dataframes Together

Concatenate a list of pandas dataframes together

Given that all the dataframes have the same columns, you can simply concat them:

import pandas as pd
df = pd.concat(list_of_dataframes)

Concatenate list of dataframes

You do not need a for loop or list comprehension for this task. Simply do:

pd.concat(df)

where df is the list of dataframes.

Here is an example:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randint(0,100,size=(1,5)), columns=list('ABCDE'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(7,5)), columns=list('ABCDE'))
df3 = pd.DataFrame(np.random.randint(0,100,size=(5,5)), columns=list('ABCDE'))
df = [df1, df2, df3]

concatenated = pd.concat(df)

Yields (for example):

    A   B   C   D   E
0 10 48 49 84 86
0 29 5 44 20 80
1 80 7 5 9 81
2 35 32 15 42 33
3 59 79 74 80 66
4 48 91 44 33 73
5 52 98 94 44 86
6 70 16 73 25 71
0 52 20 75 34 90
1 92 88 26 35 26
2 54 3 49 70 46
3 24 12 71 69 57
4 3 71 93 58 74

And you can use .reset_index(drop=True) to reset the index if you desire.

Concatenate List of Dataframes and Include Original Dataframe Names as Keys

Let's assign an indicator column to each DataFrame in the list. (Names can be zipped together with the list of DataFrames or created by something like enumerate):

With enumerate

pd.concat(d.assign(df_name=f'{i:02d}') for i, d in enumerate(list_of_rosters))

0 1 df_name
0 4 7 00
1 7 1 00
2 9 5 00
0 8 1 01
1 1 8 01
2 2 6 01

Or with zip:

pd.concat(d.assign(df_name=name)
for name, d in zip(['name1', 'name2'], list_of_rosters))

0 1 df_name
0 4 7 name1
1 7 1 name1
2 9 5 name1
0 8 1 name2
1 1 8 name2
2 2 6 name2

Setup:

import numpy as np
import pandas as pd

np.random.seed(5)
list_of_rosters = [
pd.DataFrame(np.random.randint(1, 10, (3, 2))),
pd.DataFrame(np.random.randint(1, 10, (3, 2)))
]

list_of_rosters:

[   0  1
0 4 7
1 7 1
2 9 5,
0 1
0 8 1
1 1 8
2 2 6]

How can I concat multiple dataframes in Python?

I think you can just put it into a list, and then concat the list. In Pandas, the chunk function kind of already does this. I personally do this when using the chunk function in pandas.

pdList = [df1, df2, ...]  # List of your dataframes
new_df = pd.concat(pdList)

To create the pdList automatically assuming your dfs always start with "cluster".

pdList = []
pdList.extend(value for name, value in locals().items() if name.startswith('cluster_'))

Merge a list of pandas dataframes

You can use reduce function where dfList is your list of data frames:

import pandas as pd
from functools import reduce
reduce(lambda x, y: pd.merge(x, y, on = 'Date'), dfList)

As a demo:

df = pd.DataFrame({'Date': [1,2,3,4], 'Value': [2,3,3,4]})
dfList = [df, df, df]
dfList

# [ Date Value
# 0 1 2
# 1 2 3
# 2 3 3
# 3 4 4, Date Value
# 0 1 2
# 1 2 3
# 2 3 3
# 3 4 4, Date Value
# 0 1 2
# 1 2 3
# 2 3 3
# 3 4 4]

reduce(lambda x, y: pd.merge(x, y, on = 'Date'), dfList)
# Date Value_x Value_y Value
# 0 1 2 2 2
# 1 2 3 3 3
# 2 3 3 3 3
# 3 4 4 4 4

How to concatenate two lists into pandas DataFrame?

Use list copmprehension with zip for list of tuples and pass toDataFrame constructor:

a = ['A',
'B',
'C',
'D',
'E']
b = [(-0.07154222477384509, 0.03681057318023705),
(-0.23678194754416643, 3.408617573881597e-12),
(-0.24277881018771763, 6.991906304566735e-13),
(-0.16858465905189185, 7.569580517034595e-07),
(-0.21850787663602167, 1.1718560531238815e-10)]

df = pd.DataFrame([(a, *b) for a, b in zip(a,b)])
print (df)
0 1 2
0 A -0.071542 3.681057e-02
1 B -0.236782 3.408618e-12
2 C -0.242779 6.991906e-13
3 D -0.168585 7.569581e-07
4 E -0.218508 1.171856e-10

With set columns names:

df = pd.DataFrame([(a, *b) for a, b in zip(a,b)],
columns=['var_name','val1','val2'])
print (df)
var_name val1 val2
0 A -0.071542 3.681057e-02
1 B -0.236782 3.408618e-12
2 C -0.242779 6.991906e-13
3 D -0.168585 7.569581e-07
4 E -0.218508 1.171856e-10

Merge a list of dataframes to create one dataframe

I think you need concat, but first set index of each DataFrame by common column:

dfs = [df.set_index('id') for df in dfList]
print pd.concat(dfs, axis=1)

If need join by merge:

from functools import reduce
df = reduce(lambda df1,df2: pd.merge(df1,df2,on='id'), dfList)


Related Topics



Leave a reply



Submit