Append multiple pandas data frames at once
Have you simply tried using a list as argument of append? Or am I missing anything?
import numpy as np
import pandas as pd
dates = np.asarray(pd.date_range('1/1/2000', periods=8))
df1 = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df2 = df1.copy()
df3 = df1.copy()
df = df1.append([df2, df3])
print df
Appending multiple data frames into one - Pandas
Instead of adding each of the dataframes to a dict, iteratively union them together.
tickerdf = pd.DataFrame()
tickerlist = ['AAPL','GOOG', 'MU']
for x in tickerlist:
tickerdf = pd.concat([tickerdf, financefetch(x)])
PS
This line:
df = pd.DataFrame(dataframe_entries).set_index('date','ticker')
should be:
df = pd.DataFrame(dataframe_entries).set_index(['date','ticker'])
How can I concat multiple dataframes in Python?
I think you can just put it into a list, and then concat the list. In Pandas, the chunk function kind of already does this. I personally do this when using the chunk function in pandas.
pdList = [df1, df2, ...] # List of your dataframes
new_df = pd.concat(pdList)
To create the pdList automatically assuming your dfs always start with "cluster".
pdList = []
pdList.extend(value for name, value in locals().items() if name.startswith('cluster_'))
Append more than 2 data frames in pandas
check http://pandas.pydata.org/pandas-docs/stable/merging.html, and the picture there is also very illustrative,
copy the code here,
frames = [df1, df2, df3]
result = pd.concat(frames)
Fastest way to merge and append multiple CSVs / data frames using pandas
If values of same column and index are same in all DataFrame
s is possible use:
It means e.g. for index=A
, column=apple
is for each Dataframe same value - here 3
(if exist)
dfs = [df1, df2, df3, df4]
#if Person is column, not index
dfs = [x.set_index('Person') for x in dfs]
df = pd.concat(dfs).groupby(level=0).first()
print (df)
apple ball cat dog
Person
A 3.0 4.0 6.0 NaN
B 5.0 1.0 2.0 NaN
C 6.0 NaN 2.0 1.0
D 2.0 NaN 2.0 1.0
appending multiple pandas DataFrames read in from files
Unlike lists, when you append to a DataFrame
you return a new object. So topics.append(df)
returns an object that you are never storing anywhere and topics
remains the empty DataFrame
you declare on the 6th line. You can fix this by
topics = topics.append(df)
However, appending to a DataFrame
within a loop is a very costly exercise. Instead you should append each DataFrame
to a list within the loop and call pd.concat()
on the list of DataFrame
s after the loop.
import pandas as pd
topics_list = []
for filename in os.listdir('./topics'):
# All of your code
topics_list.append(df) # Lists are modified with append
# After the loop one call to concat
topics = pd.concat(topics_list)
How to merge multiple dataframes
Below, is the most clean, comprehensible way of merging multiple dataframe if complex queries aren't involved.
Just simply merge with DATE as the index and merge using OUTER method (to get all the data).
import pandas as pd
from functools import reduce
df1 = pd.read_table('file1.csv', sep=',')
df2 = pd.read_table('file2.csv', sep=',')
df3 = pd.read_table('file3.csv', sep=',')
Now, basically load all the files you have as data frame into a list. And, then merge the files using merge
or reduce
function.
# compile the list of dataframes you want to merge
data_frames = [df1, df2, df3]
Note: you can add as many data-frames inside the above list. This is the good part about this method. No complex queries involved.
To keep the values that belong to the same date you need to merge it on the DATE
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['DATE'],
how='outer'), data_frames)
# if you want to fill the values that don't exist in the lines of merged dataframe simply fill with required strings as
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['DATE'],
how='outer'), data_frames).fillna('void')
- Now, the output will the values from the same date on the same lines.
- You can fill the non existing data from different frames for different columns using fillna().
Then write the merged data to the csv file if desired.
pd.DataFrame.to_csv(df_merged, 'merged.txt', sep=',', na_rep='.', index=False)
This should give you
DATE VALUE1 VALUE2 VALUE3 ....
How to append multiple dataframes with same prefix in python
You were proceeding in correct direction, just use eval
:
tempdf = df1
for i in range(2,4):
tempdf = tempdf.append(eval("df"+str(i)))
print(tempdf)
Note: Using eval can run arbitrary code, using it is considered a bad practice. Please try to use other ways, if possible.
Appending multiple dataframes in pandas with for loops
Consider building a list of data frames, then concatenate items once outside loop. Specifically, below uses a list comprehension that also assigns columns in each iteration, followed by a pd.concat
call.
url = 'https://www.treasury.gov/resource-center/data-chart-center/interest-rates/' + \
'pages/TextView.aspx?data=yieldYear&year=({yr})'
DateList = ['Date', '1 mo', '2 mo', '3 mo', '6 mo', '1 yr', '2 yr',
'3 yr', '5 yr', '7 yr', '10 yr', '20 yr', '30 yr']
dfs = [(pd.read_html(url.format(yr=x), skiprows=1)[1]
.set_axis(DateList, axis='columns', inplace=False)) for x in range(2017, 2019)]
final_df = pd.concat(dfs, ignore_index=True)
print(final_df.head())
# Date 1 mo 2 mo 3 mo 6 mo ... 5 yr 7 yr 10 yr 20 yr 30 yr
# 0 01/03/17 0.52 NaN 0.53 0.65 ... 1.94 2.26 2.45 2.78 3.04
# 1 01/04/17 0.49 NaN 0.53 0.63 ... 1.94 2.26 2.46 2.78 3.05
# 2 01/05/17 0.51 NaN 0.52 0.62 ... 1.86 2.18 2.37 2.69 2.96
# 3 01/06/17 0.50 NaN 0.53 0.61 ... 1.92 2.23 2.42 2.73 3.00
# 4 01/09/17 0.50 NaN 0.50 0.60 ... 1.89 2.18 2.38 2.69 2.97
Related Topics
What Do Ellipsis [...] Mean in a List
Pythonic Way to Print List Items
What's the Correct Way to Convert Bytes to a Hex String in Python 3
Matplotlib: How to Create Axessubplot Objects, Then Add Them to a Figure Instance
Understanding Dict.Copy() - Shallow or Deep
What Is the Reason for Performing a Double Fork When Creating a Daemon
How to See the Entire Http Request That's Being Sent by My Python Application
Python Error "Importerror: No Module Named"
Determine Whether Integer Is Between Two Other Integers
Filtering Pandas Dataframes on Dates
How to Make Firefox Headless Programmatically in Selenium with Python
Custom Sorting in Pandas Dataframe
Converting a String Representation of a List into an Actual List Object
How to Deploy a Perl/Python/Ruby Script Without Installing an Interpreter