Append Dataframes together in for loop
Do not use pd.DataFrame.append
in a loop
This is inefficient as it involves copying data repeatedly. A much better idea is to create a list of dataframes and then concatenate them at the end in a final step outside your loop. Here's some pseudo-code:
symbols = ['WYNN', 'FL', 'TTWO']
cols = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']
dfs = [] # empty list which will hold your dataframes
for c in range(len(symbols)):
# some code
df = pd.DataFrame(stock_data, columns=cols)
df = df.set_index('Date')
df['Volume'] = df['Volume'].str.replace(',', '').astype(int)
df[cols[0]] = pd.to_datetime(df[cols[0]], errors='coerce')
df[cols[1:5]] = df[cols[1:5]].apply(pd.to_datetime, errors='coerce')
dfs.append(df) # append dataframe to list
res = pd.concat(dfs, ignore_index=True) # concatenate list of dataframes
res.to_excel('stock data.xlsx', index=False)
Note you are performing many operations, e.g. set_index
, as if they are by default in place. That's not the case. You should assign back to a variable, e.g. df = df.set_index('Date')
.
Appending multiple dataframes in pandas with for loops
Consider building a list of data frames, then concatenate items once outside loop. Specifically, below uses a list comprehension that also assigns columns in each iteration, followed by a pd.concat
call.
url = 'https://www.treasury.gov/resource-center/data-chart-center/interest-rates/' + \
'pages/TextView.aspx?data=yieldYear&year=({yr})'
DateList = ['Date', '1 mo', '2 mo', '3 mo', '6 mo', '1 yr', '2 yr',
'3 yr', '5 yr', '7 yr', '10 yr', '20 yr', '30 yr']
dfs = [(pd.read_html(url.format(yr=x), skiprows=1)[1]
.set_axis(DateList, axis='columns', inplace=False)) for x in range(2017, 2019)]
final_df = pd.concat(dfs, ignore_index=True)
print(final_df.head())
# Date 1 mo 2 mo 3 mo 6 mo ... 5 yr 7 yr 10 yr 20 yr 30 yr
# 0 01/03/17 0.52 NaN 0.53 0.65 ... 1.94 2.26 2.45 2.78 3.04
# 1 01/04/17 0.49 NaN 0.53 0.63 ... 1.94 2.26 2.46 2.78 3.05
# 2 01/05/17 0.51 NaN 0.52 0.62 ... 1.86 2.18 2.37 2.69 2.96
# 3 01/06/17 0.50 NaN 0.53 0.61 ... 1.92 2.23 2.42 2.73 3.00
# 4 01/09/17 0.50 NaN 0.50 0.60 ... 1.89 2.18 2.38 2.69 2.97
Append data frames together in a for loop
Don't do it inside the loop. Make a list, then combine them outside the loop.
datalist = list()
for (i in 1:5) {
# ... make some data
dat <- data.frame(x = rnorm(10), y = runif(10))
dat$i <- i # maybe you want to keep track of which iteration produced it?
datalist[[i]] <- dat # add it to your list
}
big_data = do.call(rbind, datalist)
# or big_data <- dplyr::bind_rows(datalist)
# or big_data <- data.table::rbindlist(datalist)
This is a much more R-like way to do things. It can also be substantially faster, especially if you use dplyr::bind_rows
or data.table::rbindlist
for the final combining of data frames.
Related Topics
List of the Most Recently Updated Files in Python
How to Read Image Data from a Url in Python
How to Get Rid of the B-Prefix in a String in Python
Running Two Python Scripts With Bash File
Django.Db.Utils.Operationalerror: (1045, Access Denied for User '<User>'@'Localhost'
Open() Gives Filenotfounderror/Ioerror: Errno 2 No Such File or Directory
Convert Np.Array of Type Float64 to Type Uint8 Scaling Values
Why Am I Getting Ioerror: [Errno 13] Permission Denied
How to Read from S3 in Pyspark Running in Local Mode
Defining and Calling a Function Within a Python Class
Create an Array With a Pre Determined Mean and Standard Deviation
Matplotlib Rotate Image File by X Degrees
How to Increment a Variable on a for Loop in Jinja Template
Missing 1 Required Positional Argument - Issue
Python - Using Regex to Find Multiple Matches and Print Them Out
Valueerror: Invalid \Escape Unable to Load Json from File