Block bootstrap from subject list
How about something like this:
myfit <- function(x, i) {
mydata <- do.call("rbind", lapply(i, function(n) subset(Grunfeld, firm==x[n])))
coefficients(lm(value ~ inv + capital, data = mydata))
}
firms <- unique(Grunfeld$firm)
b0 <- boot(firms, myfit, 999)
How to bootstrap respecting within-subject information?
Just modify your call to boot()
like this:
data.boot <- boot(data, boot.huber, 1999, strata=data$Subject)
?boot
provides this description of the strata=
argument, which does exactly what you are asking for:
strata: An integer vector or factor specifying the strata for
multi-sample problems. This may be specified for any
simulation, but is ignored when ‘sim = "parametric"’. When
‘strata’ is supplied for a nonparametric bootstrap, the
simulations are done within the specified strata.
Additional note:
To confirm that it's working as you'd like, you can call debugonce(boot)
, run the call above, and step through the debugger until the object i
(whose rows contain the indices used to resample rows of data
to create each bootstrap resample) has been assigned, and then have a look at it.
debugonce(boot)
data.boot <- boot(data, boot.huber, 1999, strata=data$Subject)
# Browse[2]>
## [Press return 34 times]
# Browse[2]> head(i)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
# [1,] 9 10 11 16 9 14 15 16 9 2 15 16 1 10
# [2,] 9 14 7 12 5 6 15 4 13 6 11 16 13 6
# [3,] 5 10 15 16 9 6 3 4 1 2 15 12 5 6
# [4,] 5 10 11 4 9 6 15 16 9 14 11 16 5 2
# [5,] 5 10 3 4 1 10 15 16 9 6 3 8 13 14
# [6,] 13 10 3 12 5 10 3 4 5 14 7 16 5 14
# [,15] [,16]
# [1,] 7 8
# [2,] 11 16
# [3,] 3 16
# [4,] 3 8
# [5,] 7 8
# [6,] 7 12
(You can enter Q
to leave the debugger at any time.)
Block bootstrap for genomic data
So, after a while I came up with an answer to my problem. Here it goes.
You'll need the package dplyr
.
l = 1000
teste = freq %>%
mutate(w = ceiling(POS/l)) %>%
group_by(CHR, w) %>%
sample_n(1)
This code creates a new variable named w
based on the position in the genome (POS). This variable w
is the window to which each row was assigned, and it depends on l
, which is the length of your window.
You can repeat this code several times, each time sampling one row per window/CHR (with the sample_n(1)
) and apply whatever statistic of interest that you want.
Block sampling according to index in panel data
Apparently in this answer every firm is viewed for exactly 20 years, so I won't have a problem demonstrating:
data("Grunfeld", package="plm") #load data
Solution
#n is the the firms column, df is the dataframe
myfunc <- function(n,df) { #define function
unique_firms <- unique(n) #unique firms
sample_firms <- sample(unique_firms, size=length(unique_firms), replace=T ) #choose from unique firms randomly with replacement
new_df <- do.call(rbind, lapply(sample_firms, function(x) df[df$firm==x,] )) #fetch all years for each randomly picked firm and rbind
}
a <- myfunc(Grunfeld$firm, Grunfeld) #run function
Output
> str(a)
'data.frame': 200 obs. of 5 variables:
$ firm : int 4 4 4 4 4 4 4 4 4 4 ...
$ year : int 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 ...
$ inv : num 40.3 72.8 66.3 51.6 52.4 ...
$ value : num 418 838 884 438 680 ...
$ capital: num 10.5 10.2 34.7 51.8 64.3 67.1 75.2 71.4 67.1 60.5 ...
As you can see dim
is exactly the same as the input data.frame
For your data the solution will be:
myfunc <- function(n,df) { #define function
unique_firms <- unique(n) #unique firms
print(unique_firms)
sample_firms <- sample(unique_firms, size=length(unique_firms), replace=T ) #choose from unique firms randomly with replacement
new_df <- do.call(rbind, lapply(sample_firms, function(x) df[df$country==x,] )) #fetch all years for each randomly picked firm and rbind
}
and Output:
> str(a)
'data.frame': 848 obs. of 18 variables:
$ isocode : Factor w/ 106 levels "AGO","ALB","ARG",..: 82 82 82 82 82 82 82 82 61 61 ...
$ time : int 2 3 4 5 6 7 8 9 2 3 ...
$ country : num 80 80 80 80 80 80 80 80 59 59 ...
$ year : int 1975 1980 1985 1990 1995 2000 2005 2010 1975 1980 ...
$ gdp : num 184619 210169 199343 268870 305255 ...
$ pop : num 33.4 34.9 36.6 37.8 38.3 ...
$ gdp_k : num 5526 6022 5443 7117 7969 ...
$ co2 : num 340353 431436 426881 431052 350874 ...
$ co2_k : num 10191 12333 11674 11407 9128 ...
$ oecd : int 1 1 1 1 1 1 1 1 1 1 ...
$ LI : int 0 0 0 0 0 0 0 0 0 0 ...
$ LMI : int 0 0 0 0 0 0 0 0 0 0 ...
$ UMI : int 0 0 0 0 0 0 0 0 0 0 ...
$ HI : int 1 1 1 1 1 1 1 1 1 1 ...
$ gdpk : num 5531 6018 5449 7118 7971 ...
$ co2k : num 10196 12355 11668 11412 9162 ...
$ co2_k.lag: num 8595 10191 12333 11674 11407 ...
$ gdp_k.lag: num 4730 5526 6022 5443 7117 ...
R block resampling by unique identifier for bootstrap
something like this could help:
positions<-replicate(1000, sample(1:nrow(df), nrow(df), T))
apply(positions, 2, function(i) lm(yvar[i]~xvar[i], df)$coef)
Related Topics
How to Define "Hidden Global Variables" Inside R Packages
Xaringan Slide Separator Not Separating Slides
Grouped Bar Chart on R Using Ggplot2
Using Glmer for Logistic Regression, How to Verify Response Reference
Data.Table Objects Aren't Updated in Rstudio Environment Panel
Do Not Open Rstudio Internal Browser After Knitting
Ggplot2 Log Transformation for Data and Scales
Fastest Way to Parse a Date-Time String to Class Date
How to Set Contrasts for My Variable in Regression Analysis with R
R: How to Expand a Row Containing a "List" to Several Rows...One for Each List Member
Means from a List of Data Frames in R
Adding an Image to Shiny Action Button
How to Make Install.Packages Return an Error If an R Package Cannot Be Installed
How to Extract Bold Text from a PDF Using R