Writing to a Dataframe from a For-Loop in R

Populating a data frame in R in a loop

You could do it like this:

 iterations = 10
variables = 2

output <- matrix(ncol=variables, nrow=iterations)

for(i in 1:iterations){
output[i,] <- runif(2)

}

output

and then turn it into a data.frame

 output <- data.frame(output)
class(output)

what this does:

  1. create a matrix with rows and columns according to the expected growth
  2. insert 2 random numbers into the matrix
  3. convert this into a dataframe after the loop has finished.

Writing a for loop with the output as a data frame in R

As this is a learning question I will not provide the solution directly.

> values <- c(-10,0,10,100)
> for (i in seq_along(values)) {print(i)} # Checking we iterate by position
[1] 1
[1] 2
[1] 3
[1] 4
> output <- vector("double", 10)
> output # Checking the place where the output will be
[1] 0 0 0 0 0 0 0 0 0 0
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
Error in output[[i]] <- rnorm(10, mean = values[[i]]) :
more elements supplied than there are to replace

As you can see the error say there are more elements to put than space (each iteration generates 10 random numbers, (in total 40) and you only have 10 spaces. Consider using a data format that allows to store several values for each iteration.
So that:

> output <- ??
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
> output # Should have length 4 and each element all the 10 values you created in the loop

Storing loop output in a dataframe in R

You can begin with y as an empty data.frame as in: y <- data.frame(). Then bind the rows to this data.frame at the end of each iteration as in: y <- rbind.data.frame(y, [output of one interation]). But you can also make this a little more tight by wrapping it in an lapply and do.call as in:

y <- do.call(rbind.data.frame,
lapply(unique(x$id),
function(i){
...;
return([output of one iteration])}))

For loop for dataframes in R

The solution to my problem was more simple that I expected:

for(i in 1:ncol(df)) {
if(i == 1){
df2 <- cbind(df2, df[ ,..i])
} else if (i == 2){
df2 <- cbind(df2, df[,..i])
} else {
diference <- df[,i] - df[,..i-1]
df2 <- cbind(df2,diference)
}

Thanks for all the alternative solutions!

How to use for loop to create new data frames using i in the name of data frame in R

You can try :

library(dplyr)
yearlist <- c(2013, 2014)

lapply(yearlist, function(x) {
maxyear <- x
minyear <- maxyear - 7

mutatedata %>%
filter(year>=minyear & year<=maxyear) %>%
group_by(symbol) %>%
summarize(
avgroepercent = mean(roe,na.rm = TRUE),
avgrocpercent = mean(roc, na.rm = TRUE),
epsroc = (((last(eps))/(first(eps)))^(1/(maxyear-minyear))-1)
)
}) -> data

where data is a list of dataframes. If you want to create separate dataframes you can use list2env.

names(data) <- paste0('metrics_', yearlist)
list2env(data, .GlobalEnv)

How to make nested for loop in R writing output to dataframe more efficient?

that's a nice way of doing it, but it's certainly possible in a shorter manner.
Try:

table$id <- 1:nrow(table) # Create a row no. column
tidyr::pivot_longer(table, cols = -id)
# A tibble: 54 x 3
id name value
<int> <chr> <dbl>
1 1 V1 70.3
2 1 V2 72.8
3 1 V3 76.1
4 1 V4 73.1
5 1 V5 71.9
6 1 V6 73.8
7 1 V7 76.4
8 1 V8 74.1
9 1 V9 75.5
10 2 V1 73.8
# ... with 44 more rows

What are we doing here?

First of all, we add the "rownames" as column to the data (because for some reason, you want to keep them in the resulting data frame.
Then, we use the pivot_longer() function from the tidyr package. What you want to do with the data is reshaping. There are many possibilities to do so in R, (reshape(), the reshape2 library, or the functions pivot_longer(), pivot_wider() from tidyr.

We want to have our "wide" data in "long" form (you may want to have a look at this Cheat Sheet, even though the functions gather() and spread() are superseded by pivot_longer() and pivot_wider(), but they basically function in the same way.

With the function argument cols = -id, we specify that all variables but id should come up in the value column of the new data frame.

If you want to have a matrix as your result, just run as.matrix() on the newly created object.

For loop for dataframes inside a function

Here is a base R approach.

list_of_df <- list(df1 = df1, df2 = df2)
f <- function(list_of_df) {
f1 <- function(df) {
meanA <- mean(df$a)
maxA <- max(df$a)
maxB <- max(df$b)
c(meanA = meanA, maxA = maxA, maxB = maxB, sum = meanA + maxA)
}
as.data.frame(lapply(list_of_df, f1))
}
f(list_of_df)
# df1 df2
# meanA 2 8
# maxA 3 9
# maxB 4 5
# sum 5 17

A few remarks:

  • If you want the variables in the result to be named df1 and df2, then you need the elements of your list of data frames to be named accordingly. list(value1, value2) is a list without names. list(name1 = value1, name2 = value2) is a list with names c("name1", "name2").
  • This implementation presupposes that all of your summary statistics are numeric. It concatenates all of the summary statistics for a given data frame using c, and the elements of the resulting atomic vector are constrained to have a common type.
  • Internally, lapply is still looping over your list of data frames. Using lapply is much cleaner than implementing the loop yourself, but lapply may not perform significantly better than an equivalent loop. If you are actually worried about performance, then you may need to describe the structure of your actual data in more detail.

FWIW, the reason you are seeing the replacement error is this:

data.frame(rownames = 'meanA', 'maxA', 'maxB', 'sum')
# rownames X.maxA. X.maxB. X.sum.
# 1 meanA maxA maxB sum

You initialized a data frame with one row, and you were trying to append a length-4 vector. Perhaps what you intended was:

data.frame(row.names = c("meanA", "maxA", "maxB", "sum"))
# data frame with 0 columns and 4 rows


Related Topics



Leave a reply



Submit