Statistical Test with Test-Data

Statistical test with test-data

One way to deal with your problem is to generate several performance values for knn and NN which you can compare using a statistical test. This can be achieved using Nested resampling.

In nested resampling you are performing train/test splits multiple times and evaluating the model on each test set.

Lets for instance use BostonHousing data:

library(caret)
library(mlbench)

data(BostonHousing)

lets just select numerical columns for the example to make it simple:

d <- BostonHousing[,sapply(BostonHousing, is.numeric)]

As far as I know there is no way to perform nested CV in caret out of the box so a simple wrapper is needed:

generate outer folds for nested CV:

outer_folds <- createFolds(d$medv, k = 5)

Lets use bootstrap resampling as the inner resample loop to tune the hyper parameters:

boot <- trainControl(method = "boot",
                     number = 100)

now loop over the outer folds and perform hyper parameter optimization using the train set and predict on the test set:

CV_knn <- lapply(outer_folds, function(index){
  tr <- d[-index, ]
  ts <- d[index,]
  
  cart1 <- train(medv ~ ., data = tr,
                 method = "knn",
                 metric = "MAE",
                 preProc = c("center", "scale", "nzv"),
                 trControl = boot,
                 tuneLength = 10) #to keep it short we will just probe 10 combinations of hyper parameters
  
  postResample(predict(cart1, ts), ts$medv)
})

extract just MAE from the results:

sapply(CV_knn, function(x) x[3]) -> CV_knn_MAE
CV_knn_MAE
#output
Fold1.MAE Fold2.MAE Fold3.MAE Fold4.MAE Fold5.MAE 
 2.503333  2.587059  2.031200  2.475644  2.607885

Do the same for glmnet learner for instance:

CV_glmnet <- lapply(outer_folds, function(index){
  tr <- d[-index, ]
  ts <- d[index,]
  
  cart1 <- train(medv ~ ., data = tr,
                 method = "glmnet",
                 metric = "MAE",
                 preProc = c("center", "scale", "nzv"),
                 trControl = boot,
                 tuneLength = 10)
  
  postResample(predict(cart1, ts), ts$medv)
})

sapply(CV_glmnet, function(x) x[3]) -> CV_glmnet_MAE

CV_glmnet_MAE
#output
Fold1.MAE Fold2.MAE Fold3.MAE Fold4.MAE Fold5.MAE 
 3.400559  3.383317  2.830140  3.605266  3.525224

now compare the two using wilcox.test. Since the performance for both learners was generated using the same data splits a paired test is appropriate:

wilcox.test(CV_knn_MAE,
            CV_glmnet_MAE,
            paired = TRUE)

If comparing more than two algorithms one can use friedman.test

Which statistical test should I use to determine whether a treatment affects different groups differently?

Assuming your data meets the criteria for parametric tests:

if you have both, several treatment groups and several measurements: 2-way ANOVA for repeated measurements.
if several treatment groups and one outcome: 1-way ANOVA
you can also perform pairwise comparisons with Students t-test

What statistical test should I do for humpback whale singing behavior?

Your dataset is quite large (total n = 4382), so there might be other issues, like dependence ... but concentrating on your data, first, present is as an contingency table :

               ExistenceOfSong
daylight.regime nosong song
       dark        874  570
       light       847  323
       twilight   1216  552

As the numbers are large, a chisquared test is highly significant, but that is maybe not of much interest. Rather present the table with percentages of interest (left for you), or as a mosaicplot:

mosaicplot of humpback songs

Statistical Test with Test-Data