Concatenate a list of pandas dataframes together
Given that all the dataframes have the same columns, you can simply concat
them:
import pandas as pd
df = pd.concat(list_of_dataframes)
Concatenate list of dataframes
You do not need a for loop or list comprehension for this task. Simply do:
pd.concat(df)
where df
is the list of dataframes.
Here is an example:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randint(0,100,size=(1,5)), columns=list('ABCDE'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(7,5)), columns=list('ABCDE'))
df3 = pd.DataFrame(np.random.randint(0,100,size=(5,5)), columns=list('ABCDE'))
df = [df1, df2, df3]
concatenated = pd.concat(df)
Yields (for example):
A B C D E
0 10 48 49 84 86
0 29 5 44 20 80
1 80 7 5 9 81
2 35 32 15 42 33
3 59 79 74 80 66
4 48 91 44 33 73
5 52 98 94 44 86
6 70 16 73 25 71
0 52 20 75 34 90
1 92 88 26 35 26
2 54 3 49 70 46
3 24 12 71 69 57
4 3 71 93 58 74
And you can use .reset_index(drop=True)
to reset the index if you desire.
Concatenate List of Dataframes and Include Original Dataframe Names as Keys
Let's assign
an indicator column to each DataFrame in the list. (Names can be zipped together with the list of DataFrames or created by something like enumerate
):
With enumerate
pd.concat(d.assign(df_name=f'{i:02d}') for i, d in enumerate(list_of_rosters))
0 1 df_name
0 4 7 00
1 7 1 00
2 9 5 00
0 8 1 01
1 1 8 01
2 2 6 01
Or with zip
:
pd.concat(d.assign(df_name=name)
for name, d in zip(['name1', 'name2'], list_of_rosters))
0 1 df_name
0 4 7 name1
1 7 1 name1
2 9 5 name1
0 8 1 name2
1 1 8 name2
2 2 6 name2
Setup:
import numpy as np
import pandas as pd
np.random.seed(5)
list_of_rosters = [
pd.DataFrame(np.random.randint(1, 10, (3, 2))),
pd.DataFrame(np.random.randint(1, 10, (3, 2)))
]
list_of_rosters
:
[ 0 1
0 4 7
1 7 1
2 9 5,
0 1
0 8 1
1 1 8
2 2 6]
How can I concat multiple dataframes in Python?
I think you can just put it into a list, and then concat the list. In Pandas, the chunk function kind of already does this. I personally do this when using the chunk function in pandas.
pdList = [df1, df2, ...] # List of your dataframes
new_df = pd.concat(pdList)
To create the pdList automatically assuming your dfs always start with "cluster".
pdList = []
pdList.extend(value for name, value in locals().items() if name.startswith('cluster_'))
Merge a list of pandas dataframes
You can use reduce
function where dfList
is your list of data frames:
import pandas as pd
from functools import reduce
reduce(lambda x, y: pd.merge(x, y, on = 'Date'), dfList)
As a demo:
df = pd.DataFrame({'Date': [1,2,3,4], 'Value': [2,3,3,4]})
dfList = [df, df, df]
dfList
# [ Date Value
# 0 1 2
# 1 2 3
# 2 3 3
# 3 4 4, Date Value
# 0 1 2
# 1 2 3
# 2 3 3
# 3 4 4, Date Value
# 0 1 2
# 1 2 3
# 2 3 3
# 3 4 4]
reduce(lambda x, y: pd.merge(x, y, on = 'Date'), dfList)
# Date Value_x Value_y Value
# 0 1 2 2 2
# 1 2 3 3 3
# 2 3 3 3 3
# 3 4 4 4 4
How to concatenate two lists into pandas DataFrame?
Use list copmprehension with zip
for list of tuples and pass toDataFrame
constructor:
a = ['A',
'B',
'C',
'D',
'E']
b = [(-0.07154222477384509, 0.03681057318023705),
(-0.23678194754416643, 3.408617573881597e-12),
(-0.24277881018771763, 6.991906304566735e-13),
(-0.16858465905189185, 7.569580517034595e-07),
(-0.21850787663602167, 1.1718560531238815e-10)]
df = pd.DataFrame([(a, *b) for a, b in zip(a,b)])
print (df)
0 1 2
0 A -0.071542 3.681057e-02
1 B -0.236782 3.408618e-12
2 C -0.242779 6.991906e-13
3 D -0.168585 7.569581e-07
4 E -0.218508 1.171856e-10
With set columns names:
df = pd.DataFrame([(a, *b) for a, b in zip(a,b)],
columns=['var_name','val1','val2'])
print (df)
var_name val1 val2
0 A -0.071542 3.681057e-02
1 B -0.236782 3.408618e-12
2 C -0.242779 6.991906e-13
3 D -0.168585 7.569581e-07
4 E -0.218508 1.171856e-10
Merge a list of dataframes to create one dataframe
I think you need concat
, but first set index of each DataFrame
by common column:
dfs = [df.set_index('id') for df in dfList]
print pd.concat(dfs, axis=1)
If need join by merge
:
from functools import reduce
df = reduce(lambda df1,df2: pd.merge(df1,df2,on='id'), dfList)
Related Topics
Global Variable from a Different File Python
How to Use a Multiprocessing.Manager()
Create a Day-Of-Week Column in a Pandas Dataframe Using Python
How to Improve the Label Placement in Scatter Plot
How to Use Pil to Make All White Pixels Transparent
What Exactly Is Contained Within a Obj._Closure_
List of Tables, Db Schema, Dump etc Using the Python SQLite3 API
How to Make Environment Variable Changes Stick in Python
Shuffle an Array with Python, Randomize Array Item Order with Python
Python Method for Reading Keypress
Runtimeerror: Main Thread Is Not in Main Loop
How to Access Pandas Groupby Dataframe by Key
How Does the Key Argument in Python's Sorted Function Work
How to Write Binary Data to Stdout in Python 3
How to Get Flask to Run on Port 80
How to Get Python Requests to Trust a Self Signed Ssl Certificate