Append Multiple Pandas Data Frames at Once

Append multiple pandas data frames at once

Have you simply tried using a list as argument of append? Or am I missing anything?

import numpy as np
import pandas as pd

dates = np.asarray(pd.date_range('1/1/2000', periods=8))
df1 = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df2 = df1.copy()
df3 = df1.copy()
df = df1.append([df2, df3])

print df

Appending multiple data frames into one - Pandas

Instead of adding each of the dataframes to a dict, iteratively union them together.

tickerdf = pd.DataFrame()
tickerlist = ['AAPL','GOOG', 'MU']
for x in tickerlist:
tickerdf = pd.concat([tickerdf, financefetch(x)])

PS

This line:

df = pd.DataFrame(dataframe_entries).set_index('date','ticker')

should be:

df = pd.DataFrame(dataframe_entries).set_index(['date','ticker'])

How can I concat multiple dataframes in Python?

I think you can just put it into a list, and then concat the list. In Pandas, the chunk function kind of already does this. I personally do this when using the chunk function in pandas.

pdList = [df1, df2, ...]  # List of your dataframes
new_df = pd.concat(pdList)

To create the pdList automatically assuming your dfs always start with "cluster".

pdList = []
pdList.extend(value for name, value in locals().items() if name.startswith('cluster_'))

Append more than 2 data frames in pandas

check http://pandas.pydata.org/pandas-docs/stable/merging.html, and the picture there is also very illustrative,
copy the code here,

frames = [df1, df2, df3]
result = pd.concat(frames)

Fastest way to merge and append multiple CSVs / data frames using pandas

If values of same column and index are same in all DataFrames is possible use:

It means e.g. for index=A, column=apple is for each Dataframe same value - here 3 (if exist)

dfs = [df1, df2, df3, df4]
#if Person is column, not index
dfs = [x.set_index('Person') for x in dfs]

df = pd.concat(dfs).groupby(level=0).first()
print (df)
apple ball cat dog
Person
A 3.0 4.0 6.0 NaN
B 5.0 1.0 2.0 NaN
C 6.0 NaN 2.0 1.0
D 2.0 NaN 2.0 1.0

appending multiple pandas DataFrames read in from files

Unlike lists, when you append to a DataFrame you return a new object. So topics.append(df) returns an object that you are never storing anywhere and topics remains the empty DataFrame you declare on the 6th line. You can fix this by

topics = topics.append(df)

However, appending to a DataFrame within a loop is a very costly exercise. Instead you should append each DataFrame to a list within the loop and call pd.concat() on the list of DataFrames after the loop.

import pandas as pd

topics_list = []
for filename in os.listdir('./topics'):
# All of your code
topics_list.append(df) # Lists are modified with append

# After the loop one call to concat
topics = pd.concat(topics_list)

How to merge multiple dataframes

Below, is the most clean, comprehensible way of merging multiple dataframe if complex queries aren't involved.

Just simply merge with DATE as the index and merge using OUTER method (to get all the data).

import pandas as pd
from functools import reduce

df1 = pd.read_table('file1.csv', sep=',')
df2 = pd.read_table('file2.csv', sep=',')
df3 = pd.read_table('file3.csv', sep=',')

Now, basically load all the files you have as data frame into a list. And, then merge the files using merge or reduce function.

# compile the list of dataframes you want to merge
data_frames = [df1, df2, df3]

Note: you can add as many data-frames inside the above list. This is the good part about this method. No complex queries involved.

To keep the values that belong to the same date you need to merge it on the DATE

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['DATE'],
how='outer'), data_frames)

# if you want to fill the values that don't exist in the lines of merged dataframe simply fill with required strings as

df_merged = reduce(lambda left,right: pd.merge(left,right,on=['DATE'],
how='outer'), data_frames).fillna('void')
  • Now, the output will the values from the same date on the same lines.
  • You can fill the non existing data from different frames for different columns using fillna().

Then write the merged data to the csv file if desired.

pd.DataFrame.to_csv(df_merged, 'merged.txt', sep=',', na_rep='.', index=False)

This should give you

DATE VALUE1 VALUE2 VALUE3 ....

How to append multiple dataframes with same prefix in python

You were proceeding in correct direction, just use eval:

tempdf = df1
for i in range(2,4):
tempdf = tempdf.append(eval("df"+str(i)))
print(tempdf)

Note: Using eval can run arbitrary code, using it is considered a bad practice. Please try to use other ways, if possible.

Appending multiple dataframes in pandas with for loops

Consider building a list of data frames, then concatenate items once outside loop. Specifically, below uses a list comprehension that also assigns columns in each iteration, followed by a pd.concat call.

url = 'https://www.treasury.gov/resource-center/data-chart-center/interest-rates/' + \
'pages/TextView.aspx?data=yieldYear&year=({yr})'

DateList = ['Date', '1 mo', '2 mo', '3 mo', '6 mo', '1 yr', '2 yr',
'3 yr', '5 yr', '7 yr', '10 yr', '20 yr', '30 yr']

dfs = [(pd.read_html(url.format(yr=x), skiprows=1)[1]
.set_axis(DateList, axis='columns', inplace=False)) for x in range(2017, 2019)]

final_df = pd.concat(dfs, ignore_index=True)

print(final_df.head())
# Date 1 mo 2 mo 3 mo 6 mo ... 5 yr 7 yr 10 yr 20 yr 30 yr
# 0 01/03/17 0.52 NaN 0.53 0.65 ... 1.94 2.26 2.45 2.78 3.04
# 1 01/04/17 0.49 NaN 0.53 0.63 ... 1.94 2.26 2.46 2.78 3.05
# 2 01/05/17 0.51 NaN 0.52 0.62 ... 1.86 2.18 2.37 2.69 2.96
# 3 01/06/17 0.50 NaN 0.53 0.61 ... 1.92 2.23 2.42 2.73 3.00
# 4 01/09/17 0.50 NaN 0.50 0.60 ... 1.89 2.18 2.38 2.69 2.97


Related Topics



Leave a reply



Submit