Statistical Test with Test-Data

Statistical test with test-data

One way to deal with your problem is to generate several performance values for knn and NN which you can compare using a statistical test. This can be achieved using Nested resampling.

In nested resampling you are performing train/test splits multiple times and evaluating the model on each test set.

Lets for instance use BostonHousing data:

library(caret)
library(mlbench)

data(BostonHousing)

lets just select numerical columns for the example to make it simple:

d <- BostonHousing[,sapply(BostonHousing, is.numeric)]

As far as I know there is no way to perform nested CV in caret out of the box so a simple wrapper is needed:

generate outer folds for nested CV:

outer_folds <- createFolds(d$medv, k = 5)

Lets use bootstrap resampling as the inner resample loop to tune the hyper parameters:

boot <- trainControl(method = "boot",
number = 100)

now loop over the outer folds and perform hyper parameter optimization using the train set and predict on the test set:

CV_knn <- lapply(outer_folds, function(index){
tr <- d[-index, ]
ts <- d[index,]

cart1 <- train(medv ~ ., data = tr,
method = "knn",
metric = "MAE",
preProc = c("center", "scale", "nzv"),
trControl = boot,
tuneLength = 10) #to keep it short we will just probe 10 combinations of hyper parameters

postResample(predict(cart1, ts), ts$medv)
})

extract just MAE from the results:

sapply(CV_knn, function(x) x[3]) -> CV_knn_MAE
CV_knn_MAE
#output
Fold1.MAE Fold2.MAE Fold3.MAE Fold4.MAE Fold5.MAE
2.503333 2.587059 2.031200 2.475644 2.607885

Do the same for glmnet learner for instance:

CV_glmnet <- lapply(outer_folds, function(index){
tr <- d[-index, ]
ts <- d[index,]

cart1 <- train(medv ~ ., data = tr,
method = "glmnet",
metric = "MAE",
preProc = c("center", "scale", "nzv"),
trControl = boot,
tuneLength = 10)

postResample(predict(cart1, ts), ts$medv)
})

sapply(CV_glmnet, function(x) x[3]) -> CV_glmnet_MAE

CV_glmnet_MAE
#output
Fold1.MAE Fold2.MAE Fold3.MAE Fold4.MAE Fold5.MAE
3.400559 3.383317 2.830140 3.605266 3.525224

now compare the two using wilcox.test. Since the performance for both learners was generated using the same data splits a paired test is appropriate:

wilcox.test(CV_knn_MAE,
CV_glmnet_MAE,
paired = TRUE)

If comparing more than two algorithms one can use friedman.test

Which statistical test should I use to determine whether a treatment affects different groups differently?

Assuming your data meets the criteria for parametric tests:

  • if you have both, several treatment groups and several measurements: 2-way ANOVA for repeated measurements.
  • if several treatment groups and one outcome: 1-way ANOVA
  • you can also perform pairwise comparisons with Students t-test

What statistical test should I do for humpback whale singing behavior?

Your dataset is quite large (total n = 4382), so there might be other issues, like dependence ... but concentrating on your data, first, present is as an contingency table :

               ExistenceOfSong
daylight.regime nosong song
dark 874 570
light 847 323
twilight 1216 552

As the numbers are large, a chisquared test is highly significant, but that is maybe not of much interest. Rather present the table with percentages of interest (left for you), or as a mosaicplot:

mosaicplot of humpback songs



Related Topics



Leave a reply



Submit