Populating a Data Frame in R in a Loop

Populating a data frame in R in a loop

You could do it like this:

 iterations = 10
variables = 2

output <- matrix(ncol=variables, nrow=iterations)

for(i in 1:iterations){
output[i,] <- runif(2)

}

output

and then turn it into a data.frame

 output <- data.frame(output)
class(output)

what this does:

  1. create a matrix with rows and columns according to the expected growth
  2. insert 2 random numbers into the matrix
  3. convert this into a dataframe after the loop has finished.

Loop to dynamically fill dataframe R

Dynamically filling an object using a for loop is fine - what causes problems is when you dynamically build an object using a for loop (e.g. using cbind and rbind rows).

When you build something dynamically, R has to go and request new memory for the object in each loop, because it keeps increasing in size. This causes a for loop to slow down with every iteration as the object gets bigger.

When you create the object beforehand (e.g. a data.frame with the right number of rows and columns), and fill it in by index, the for loop doesn't have this problem.

One final thing to keep in mind is that for data.frames (and matrices) each column is stored as a vector in memory – so its usually more efficient to fill these in one column at a time.

With all that in mind we can revise your code as follows:

results <- data.frame(matrix(NA, nrow = length(seq(1:10)), 
ncol = length(seq(1:10))))
for (rowIdx in 1:nrow(results)) {
for (colIdx in 1:ncol(results)) {
results[rowIdx, colIdx] <- 5 # or whatever value you want here
}
}

Loop through data frame and match/populate rows with column values


library(dplyr)

df1 <- data.frame(
MON = c(1,2,3),
TUE = c(5,6,7),
WED = c(8,9,10),
THU = c(11,12,13),
FRI = c(14,15,16),
SAT = c(17,18,19),
SUN = c(20,21,22))

df2 <- data.frame(
Day = c('THU', 'FRI', 'SAT', 'SUN', 'MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN', 'MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN'),
Hours = 0
)

Example df1: (sorry, I didn't take the time to recreate you exact data, please follow through)

    MON   TUE   WED   THU   FRI   SAT   SUN
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 5 8 11 14 17 20
2 2 6 9 12 15 18 21
3 3 7 10 13 16 19 22

Example df2:

   Day   Hours
<chr> <dbl>
1 THU 0
2 FRI 0
3 SAT 0
4 SUN 0
5 MON 0
6 TUE 0
7 WED 0
8 THU 0
9 FRI 0
10 SAT 0
11 SUN 0
12 MON 0
13 TUE 0
14 WED 0
15 THU 0
16 FRI 0
17 SAT 0
18 SUN 0

Step 1: This should be the algorithm you`re looking for to sort df2 into df1 in the way you described it.

row_df2 <- 1

for (row_df1 in seq(1,nrow(df1))) {
for (day in c('MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN'))
if (df2[row_df2, 'Day'] == day) {
df2[row_df2,'Hours'] <- df1[row_df1,day]
row_df2 <- row_df2 + 1
}
}

Step 2: now you could sum up the values in df1, e.g. using dplyr:

df1 <- df1 %>%
mutate(
Sum = MON + TUE + WED + THU + FRI + SAT + SUN
)

df1:

# A tibble: 3 x 8
MON TUE WED THU FRI SAT SUN Sum
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 5 8 11 14 17 20 76
2 2 6 9 12 15 18 21 83
3 3 7 10 13 16 19 22 90

df2:

# A tibble: 18 x 2
Day Hours
<chr> <dbl>
1 THU 11 <- row 1: THU
2 FRI 14 <- row 1: FRI
3 SAT 17 <- ...
4 SUN 20
5 MON 2 <- row 2: MON
6 TUE 6 <- ....
7 WED 9
8 THU 12
9 FRI 15
10 SAT 18
11 SUN 21 <- row 2: SUN
12 MON 3 <- row 3: MON
13 TUE 7
14 WED 10
15 THU 13
16 FRI 16
17 SAT 19
18 SUN 22

Is there no identifier like Date in both tables? This would make it much more robust. You could then match by date without relying on the right day to start with.

Edit 1: Updated after testing and removal of some errors.

Edit 2: Highlighted which value from df1 will land in df2. I just used different example data than you (I didn't want to type it all in).

Edit 3: Used data.frame instead of tibble in example data to demonstrate it should work as well.

Edit 4: Is this what you want?

row_df1 <- 1
row_df2 <- 1

for (row_df2 in seq(1,nrow(df2))) {
for (day in week) {
if (df2[row_df2, 'Day'] == day) {
df2[row_df2,'Hours'] <- df1[row_df1,day]
row_df2 <- row_df2 + 1
}
}

df2

will lead to:

   Day Hours
1 THU 11 <- row 1: THU
2 FRI 14
3 SAT 17
4 SUN 20
5 MON 1
6 TUE 5
7 WED 8
8 THU 11 <- row 1: THU
9 FRI 14
10 SAT 17
11 SUN 20
12 MON 1
13 TUE 5
14 WED 8
15 THU 11 <- row 1: THU
16 FRI 14
17 SAT 17
18 SUN 20

Edit 5: Seems there is a { missing:

for (row_df2 in seq(1,nrow(Calendar$Jan))) {
for (day in week) { # <- HERE
if (Calendar$Jan[row_df2, 'Day'] == day) {
Calendar$Jan[row_df2,'Hours'] <- Calctable[row_df1,day]
row_df2 <- row_df2 + 1
}
}

Edit 6:

In Edit 5 I assigned week <- c('MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN') but forgot to mention it. It should have looked like (no special built-in variables here):

week <- c('MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN')

for (row_df2 in seq(1,nrow(Calendar$Jan))) {
for (day in week) {
if (Calendar$Jan[row_df2, 'Day'] == day) {
Calendar$Jan[row_df2,'Hours'] <- Calctable[row_df1,day]
row_df2 <- row_df2 + 1
}
}
}

in case you re-use week at some other point in your code. I used it for testing the loop and mixed it up in the previous version of this answer.

Using an if else loop to populate a dataframe in R

A solution using loops and a lookup list.

First store the cut breaks and labels for each code in a list.

tmp=list(
"21"=list(
"brk"=c(0,0.01,0.0375,0.0725,0.1,1),
"lab"=0:4
),
"24"=list(
"brk"=c(0,0.01,0.0375,0.0725,0.1,1),
"lab"=4:0
)
)

Then loop over the columns of interest and for each code apply the cut function.

for(cc in c("oP.Res","TP.Res")) {
Merged[paste0(cc,"_cut")]=NA
for (ctg in unique(Merged$MEAS_ANAL_METH_CODE)) {
Merged[Merged$MEAS_ANAL_METH_CODE==ctg,paste0(cc,"_cut")]=
as.character(
cut(
Merged[Merged$MEAS_ANAL_METH_CODE==ctg,cc],
tmp[[as.character(ctg)]][["brk"]],
tmp[[as.character(ctg)]][["lab"]]
)
)
}
}

for loop to populate dataframe

Since you are already using dplyr, it is easy to also use purrr to merge the data.frames for you

library(purrr)
map_df(start.year:end.year, function(year) {
mat <- df %>%
filter(Level == "Grad" & EntryYear <= year & ExitYear >= year) %>%
distinct(RA) %>%
summarise(year= n())
})

Writing a for loop with the output as a data frame in R

As this is a learning question I will not provide the solution directly.

> values <- c(-10,0,10,100)
> for (i in seq_along(values)) {print(i)} # Checking we iterate by position
[1] 1
[1] 2
[1] 3
[1] 4
> output <- vector("double", 10)
> output # Checking the place where the output will be
[1] 0 0 0 0 0 0 0 0 0 0
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
Error in output[[i]] <- rnorm(10, mean = values[[i]]) :
more elements supplied than there are to replace

As you can see the error say there are more elements to put than space (each iteration generates 10 random numbers, (in total 40) and you only have 10 spaces. Consider using a data format that allows to store several values for each iteration.
So that:

> output <- ??
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
> output # Should have length 4 and each element all the 10 values you created in the loop


Related Topics



Leave a reply



Submit