Append Dataframes Together in for Loop

Append Dataframes together in for loop

Do not use pd.DataFrame.append in a loop

This is inefficient as it involves copying data repeatedly. A much better idea is to create a list of dataframes and then concatenate them at the end in a final step outside your loop. Here's some pseudo-code:

symbols = ['WYNN', 'FL', 'TTWO']
cols = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']

dfs = [] # empty list which will hold your dataframes

for c in range(len(symbols)):
# some code

df = pd.DataFrame(stock_data, columns=cols)
df = df.set_index('Date')

df['Volume'] = df['Volume'].str.replace(',', '').astype(int)

df[cols[0]] = pd.to_datetime(df[cols[0]], errors='coerce')
df[cols[1:5]] = df[cols[1:5]].apply(pd.to_datetime, errors='coerce')

dfs.append(df) # append dataframe to list

res = pd.concat(dfs, ignore_index=True) # concatenate list of dataframes
res.to_excel('stock data.xlsx', index=False)

Note you are performing many operations, e.g. set_index, as if they are by default in place. That's not the case. You should assign back to a variable, e.g. df = df.set_index('Date').

Appending multiple dataframes in pandas with for loops

Consider building a list of data frames, then concatenate items once outside loop. Specifically, below uses a list comprehension that also assigns columns in each iteration, followed by a pd.concat call.

url = 'https://www.treasury.gov/resource-center/data-chart-center/interest-rates/' + \
'pages/TextView.aspx?data=yieldYear&year=({yr})'

DateList = ['Date', '1 mo', '2 mo', '3 mo', '6 mo', '1 yr', '2 yr',
'3 yr', '5 yr', '7 yr', '10 yr', '20 yr', '30 yr']

dfs = [(pd.read_html(url.format(yr=x), skiprows=1)[1]
.set_axis(DateList, axis='columns', inplace=False)) for x in range(2017, 2019)]

final_df = pd.concat(dfs, ignore_index=True)

print(final_df.head())
# Date 1 mo 2 mo 3 mo 6 mo ... 5 yr 7 yr 10 yr 20 yr 30 yr
# 0 01/03/17 0.52 NaN 0.53 0.65 ... 1.94 2.26 2.45 2.78 3.04
# 1 01/04/17 0.49 NaN 0.53 0.63 ... 1.94 2.26 2.46 2.78 3.05
# 2 01/05/17 0.51 NaN 0.52 0.62 ... 1.86 2.18 2.37 2.69 2.96
# 3 01/06/17 0.50 NaN 0.53 0.61 ... 1.92 2.23 2.42 2.73 3.00
# 4 01/09/17 0.50 NaN 0.50 0.60 ... 1.89 2.18 2.38 2.69 2.97

Append data frames together in a for loop

Don't do it inside the loop. Make a list, then combine them outside the loop.

datalist = list()

for (i in 1:5) {
# ... make some data
dat <- data.frame(x = rnorm(10), y = runif(10))
dat$i <- i # maybe you want to keep track of which iteration produced it?
datalist[[i]] <- dat # add it to your list
}

big_data = do.call(rbind, datalist)
# or big_data <- dplyr::bind_rows(datalist)
# or big_data <- data.table::rbindlist(datalist)

This is a much more R-like way to do things. It can also be substantially faster, especially if you use dplyr::bind_rows or data.table::rbindlist for the final combining of data frames.



Related Topics



Leave a reply



Submit