Populating a data frame in R in a loop
You could do it like this:
iterations = 10
variables = 2
output <- matrix(ncol=variables, nrow=iterations)
for(i in 1:iterations){
output[i,] <- runif(2)
}
output
and then turn it into a data.frame
output <- data.frame(output)
class(output)
what this does:
- create a matrix with rows and columns according to the expected growth
- insert 2 random numbers into the matrix
- convert this into a dataframe after the loop has finished.
Writing a for loop with the output as a data frame in R
As this is a learning question I will not provide the solution directly.
> values <- c(-10,0,10,100)
> for (i in seq_along(values)) {print(i)} # Checking we iterate by position
[1] 1
[1] 2
[1] 3
[1] 4
> output <- vector("double", 10)
> output # Checking the place where the output will be
[1] 0 0 0 0 0 0 0 0 0 0
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
Error in output[[i]] <- rnorm(10, mean = values[[i]]) :
more elements supplied than there are to replace
As you can see the error say there are more elements to put than space (each iteration generates 10 random numbers, (in total 40) and you only have 10 spaces. Consider using a data format that allows to store several values for each iteration.
So that:
> output <- ??
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
> output # Should have length 4 and each element all the 10 values you created in the loop
Storing loop output in a dataframe in R
You can begin with y as an empty data.frame
as in: y <- data.frame()
. Then bind the rows to this data.frame at the end of each iteration as in: y <- rbind.data.frame(y, [output of one interation])
. But you can also make this a little more tight by wrapping it in an lapply
and do.call
as in:
y <- do.call(rbind.data.frame,
lapply(unique(x$id),
function(i){
...;
return([output of one iteration])}))
For loop for dataframes in R
The solution to my problem was more simple that I expected:
for(i in 1:ncol(df)) {
if(i == 1){
df2 <- cbind(df2, df[ ,..i])
} else if (i == 2){
df2 <- cbind(df2, df[,..i])
} else {
diference <- df[,i] - df[,..i-1]
df2 <- cbind(df2,diference)
}
Thanks for all the alternative solutions!
How to use for loop to create new data frames using i in the name of data frame in R
You can try :
library(dplyr)
yearlist <- c(2013, 2014)
lapply(yearlist, function(x) {
maxyear <- x
minyear <- maxyear - 7
mutatedata %>%
filter(year>=minyear & year<=maxyear) %>%
group_by(symbol) %>%
summarize(
avgroepercent = mean(roe,na.rm = TRUE),
avgrocpercent = mean(roc, na.rm = TRUE),
epsroc = (((last(eps))/(first(eps)))^(1/(maxyear-minyear))-1)
)
}) -> data
where data
is a list of dataframes. If you want to create separate dataframes you can use list2env
.
names(data) <- paste0('metrics_', yearlist)
list2env(data, .GlobalEnv)
How to make nested for loop in R writing output to dataframe more efficient?
that's a nice way of doing it, but it's certainly possible in a shorter manner.
Try:
table$id <- 1:nrow(table) # Create a row no. column
tidyr::pivot_longer(table, cols = -id)
# A tibble: 54 x 3
id name value
<int> <chr> <dbl>
1 1 V1 70.3
2 1 V2 72.8
3 1 V3 76.1
4 1 V4 73.1
5 1 V5 71.9
6 1 V6 73.8
7 1 V7 76.4
8 1 V8 74.1
9 1 V9 75.5
10 2 V1 73.8
# ... with 44 more rows
What are we doing here?
First of all, we add the "rownames" as column to the data (because for some reason, you want to keep them in the resulting data frame.
Then, we use the pivot_longer()
function from the tidyr
package. What you want to do with the data is reshaping. There are many possibilities to do so in R, (reshape()
, the reshape2
library, or the functions pivot_longer()
, pivot_wider()
from tidyr
.
We want to have our "wide" data in "long" form (you may want to have a look at this Cheat Sheet, even though the functions gather()
and spread()
are superseded by pivot_longer()
and pivot_wider()
, but they basically function in the same way.
With the function argument cols = -id
, we specify that all variables but id
should come up in the value column of the new data frame.
If you want to have a matrix as your result, just run as.matrix()
on the newly created object.
For loop for dataframes inside a function
Here is a base R approach.
list_of_df <- list(df1 = df1, df2 = df2)
f <- function(list_of_df) {
f1 <- function(df) {
meanA <- mean(df$a)
maxA <- max(df$a)
maxB <- max(df$b)
c(meanA = meanA, maxA = maxA, maxB = maxB, sum = meanA + maxA)
}
as.data.frame(lapply(list_of_df, f1))
}
f(list_of_df)
# df1 df2
# meanA 2 8
# maxA 3 9
# maxB 4 5
# sum 5 17
A few remarks:
- If you want the variables in the result to be named
df1
anddf2
, then you need the elements of your list of data frames to be named accordingly.list(value1, value2)
is a list without names.list(name1 = value1, name2 = value2)
is a list with namesc("name1", "name2")
. - This implementation presupposes that all of your summary statistics are numeric. It concatenates all of the summary statistics for a given data frame using
c
, and the elements of the resulting atomic vector are constrained to have a common type. - Internally,
lapply
is still looping over your list of data frames. Usinglapply
is much cleaner than implementing the loop yourself, butlapply
may not perform significantly better than an equivalent loop. If you are actually worried about performance, then you may need to describe the structure of your actual data in more detail.
FWIW, the reason you are seeing the replacement error is this:
data.frame(rownames = 'meanA', 'maxA', 'maxB', 'sum')
# rownames X.maxA. X.maxB. X.sum.
# 1 meanA maxA maxB sum
You initialized a data frame with one row, and you were trying to append a length-4 vector. Perhaps what you intended was:
data.frame(row.names = c("meanA", "maxA", "maxB", "sum"))
# data frame with 0 columns and 4 rows
Related Topics
How to Round a Data.Frame in R That Contains Some Character Variables
Center-Align Legend Title and Legend Keys in Ggplot2 for Long Legend Titles
How to Change Knitr Options Mid Chunk
Checking Cran Incoming Feasibility ... Note Maintainer
Create Lagged Variable in Unbalanced Panel Data in R
R List Get First Item of Each Element
Change the Color of the Axis Labels
Error When Using Predict() on a Randomforest Object Trained with Caret's Train() Using Formula
How to Read Data from Cassandra with R
Generate Observers for Dynamic Number of Inputs
Emacs Ess Mode - Tabbing for Comment Region
Functions Available for Tufte Boxplots in R
Texture in Barplot for 7 Bars in R
Forcing R Output to Be Scientific Notation with at Most Two Decimals
How to Create Datatable with Complex Header in R Shiny
How to Specify Command Line Parameters to R-Script in Rstudio
Is Data Really Copied Four Times in R's Replacement Functions