Repeat the Re-Sampling Function for 1000 Times? Using Lapply

Automate for loop to increase samples selected

You can use lapply and save the output as nested list.

select_rows <- 2:4
n_times <- 5
inds <- nrow(iris)

result <- lapply(select_rows, function(x) 
                replicate(n_times, iris[sample(inds, x), ], simplify = FALSE))

Change select_rows and n_times as per your choice.

How to resample and remodel n times by vectorization?

(Assuming butterfly$inv.alt <- 1/butterfly$alt)

You get the error because resample is not a list of resampled data.frames, which you can obtain with:

resample <- replicate(B, butterfly[sample(1:n, replace = TRUE),], simplify = FALSE)

The the following should work:

re.model <- lapply(resample, lm, formula = Hk ~ inv.alt)

To extract coefficients from a list of models, re.model$coef does work. The correct path to coefficients are: re.model[[1]]$coef, re.model[[2]]$coef, .... You can get all of them with the following code:

re.model.coef <- sapply(re.model, coef)

Then you can combined it with the observed coefficients:

estMat <- cbind(re.model.coef, model$coef)

In fact, you can put all of them into replicate:

re.model.coef <- replicate(B, {
    bf.rs <- butterfly[sample(1:n, replace = TRUE),]
    coef(lm(formula = Hk ~ inv.alt, data = bf.rs))
})
estMat <- cbind(re.model.coef, model$coef)

Resample and looping over dplyr functions in R

I don't think you need a loop. You can do this faster by sampling 3*1000 values per group at once, assign sample_id and add it to grouping variables, and finaly summarize to get desired values. This way you are calling all functions only once. -

dat %>% 
  group_by(fertilizer, crop, level) %>% 
  sample_n(3*1000, replace = T) %>% 
  mutate(sample_id = rep(1:1000, each = 3)) %>% 
  group_by(sample_id, add = TRUE) %>% 
  summarise(
    mean = mean(growth, na.rm = T),
    var = sd(growth)^2
  ) %>% 
  ungroup()

# A tibble: 8,000 x 6
   fertilizer crop  level sample_id  mean      var
   <chr>      <chr> <chr>     <int> <dbl>    <dbl>
 1 N          alone high          1 30.7  2640.   
 2 N          alone high          2  1       0    
 3 N          alone high          3 60.3  2640.   
 4 N          alone high          4  1.33    0.333
 5 N          alone high          5  1.33    0.333
 6 N          alone high          6 60.3  2640.   
 7 N          alone high          7  1.33    0.333
 8 N          alone high          8 30.3  2670.   
 9 N          alone high          9  1.33    0.333
10 N          alone high         10 60.7  2581.   
# ... with 7,990 more rows

How to create a dataframe with repeated columns created from randomly sampling another dataframe?

Here is a data.table option that might help

dt <- as.data.table(df)
dt2 <- as.data.table(df2)
n <- 1000
res <- cbind(
  dt2[, .(obs)],
  dt2[
    ,
    replicate(n, sample(dt[.BY, x, on = "cat"], .N, replace = TRUE), simplify = FALSE),
    cat
  ]
)

How to run a function multiple times and write the results to a list?

Combination of above answer, comment, and my own answer. Naturally, I like mine better. Also, I think there is a mistake in the above answer for base R.

n <- 10

# give 1 to gen_mat n-times
lapply(rep(1, n), gen_mat)

# replicate n-times 
replicate(n, gen_mat(1), simplify=FALSE)

# lapply returns error if FUN is not function or 
# the function is not taking an argument. Hence a dummy function.
lapply(seq_len(n), function(x) gen_mat(1))

microbenchmarking the three methods

I used a larger value for n, but the results are similar in my desktop with smaller n as well. For this, replicate takes longer than the other two methods.

set.seed(1)
gen_mat <- function(x) matrix(c(1, 1, 1, x + rnorm(1)), nrow = 2)
n <- 1000 

library(microbenchmark)
library(ggplot2)

mb <- microbenchmark(
  lap1 = {lapply(rep(1, n), gen_mat)},
  repl = {replicate(n, gen_mat(1), simplify=FALSE)},
  lap2 = {lapply(seq_len(n), function(x) gen_mat(1))},
  times = 10000L
)

mb
# Unit: milliseconds
# expr      min       lq     mean   median       uq      max neval cld
# lap1 2.839435 3.494157 4.806954 3.854269 5.611413 103.0111 10000  a 
# repl 3.091829 3.777199 5.140789 4.165856 6.013591 101.4318 10000   b
# lap2 3.131491 3.761274 5.089170 4.140316 5.939075 101.1983 10000   b

autoplot(mb)

Sample Image

In R,Computing average from 1000 replications

Here's a more R-ish solution that doesn't use a loop:

x <- replicate(1000, sample.int(500,30,replace=F))
y <- abs(2*colMeans(x)-501)

sumValue = sum(y)
divValue = mean(y)

replicate will give us the desired output as a matrix, and we can then use the vectorized function colMeans.

This method will perform much better as your number of replicates increases:

loopSoln <- function(N) {
  value = list()
  for(i in 1:N) {
    x=sample(1:500,30,replace=F)
    mhat=stimateM=2*mean(x)-1
    y=abs(mhat-500)
    value[i]=y
  }
  sumValue = sum(unlist(value))
  divValue = sumValue/N
}

replicateSoln <- function(N) {
  x <- replicate(1000, sample.int(500,30,replace=F))
  y <- abs(2*colMeans(x)-501)
  sumValue = sum(y)
  divValue = mean(y)
}

(ltimes <- sapply(c(1e2,1e3,1e4,1e5), function(N) system.time(loopSoln(N))[3]))
## elapsed elapsed elapsed elapsed 
##   0.002   0.014   0.158   2.009 

(rtimes <- sapply(c(1e2,1e3,1e4,1e5), function(N) system.time(replicateSoln(N))[3]))
## elapsed elapsed elapsed elapsed 
##   0.007   0.011   0.007   0.010 

plot(ltimes~I(2:5), type = 'l', col = 2, xlab = "log_10 number of replicates", ylab = "elapsed time (sec)")
lines(rtimes~I(2:5), col = 4)
legend("topleft", lty = 1, col = c(2,4), 
  legend = c("Loop", "replicate + rowMeans"))

Benchmarks

Repeat simulation of test scores 1000 times

At the end of the day you are testing whether at least 22 students score higher than 78, which can be compactly computed with:

sum(rnorm(75, 70, 10) > 78) >= 22

Breaking this down a bit, rnorm(75, 70, 10) returns the 75 scores, which are normally distributed with mean 70 and standard deviation 10. rnorm(75, 70, 10) > 78 is a vector of length 75 that indicates whether or not each of these scores is above 78. sum(rnorm(75, 70, 10) > 78) converts each true to a 1 and each false to a 0 and sums these values up, meaning it counts the number of the 75 scores that exceed 78. Lastly we test whether the sum is 22 or higher with the full expression above.

replicate can be used to replicate this any number of times. So to see the breakdown of 1000 simulations, you can use the following 1-liner (after setting your random seed, of course):

set.seed(144)
table(replicate(1000, sum(rnorm(75, 70, 10) > 78) >= 22))
# FALSE  TRUE 
#   936    64

In 64 of the replicates, at least 22 students scored above a 78, so we estimate the probability to be 6.4%.