Statistical test with test-data
One way to deal with your problem is to generate several performance values for knn and NN which you can compare using a statistical test. This can be achieved using Nested resampling.
In nested resampling you are performing train/test splits multiple times and evaluating the model on each test set.
Lets for instance use BostonHousing data:
library(caret)
library(mlbench)
data(BostonHousing)
lets just select numerical columns for the example to make it simple:
d <- BostonHousing[,sapply(BostonHousing, is.numeric)]
As far as I know there is no way to perform nested CV in caret out of the box so a simple wrapper is needed:
generate outer folds for nested CV:
outer_folds <- createFolds(d$medv, k = 5)
Lets use bootstrap resampling as the inner resample loop to tune the hyper parameters:
boot <- trainControl(method = "boot",
number = 100)
now loop over the outer folds and perform hyper parameter optimization using the train set and predict on the test set:
CV_knn <- lapply(outer_folds, function(index){
tr <- d[-index, ]
ts <- d[index,]
cart1 <- train(medv ~ ., data = tr,
method = "knn",
metric = "MAE",
preProc = c("center", "scale", "nzv"),
trControl = boot,
tuneLength = 10) #to keep it short we will just probe 10 combinations of hyper parameters
postResample(predict(cart1, ts), ts$medv)
})
extract just MAE from the results:
sapply(CV_knn, function(x) x[3]) -> CV_knn_MAE
CV_knn_MAE
#output
Fold1.MAE Fold2.MAE Fold3.MAE Fold4.MAE Fold5.MAE
2.503333 2.587059 2.031200 2.475644 2.607885
Do the same for glmnet learner for instance:
CV_glmnet <- lapply(outer_folds, function(index){
tr <- d[-index, ]
ts <- d[index,]
cart1 <- train(medv ~ ., data = tr,
method = "glmnet",
metric = "MAE",
preProc = c("center", "scale", "nzv"),
trControl = boot,
tuneLength = 10)
postResample(predict(cart1, ts), ts$medv)
})
sapply(CV_glmnet, function(x) x[3]) -> CV_glmnet_MAE
CV_glmnet_MAE
#output
Fold1.MAE Fold2.MAE Fold3.MAE Fold4.MAE Fold5.MAE
3.400559 3.383317 2.830140 3.605266 3.525224
now compare the two using wilcox.test
. Since the performance for both learners was generated using the same data splits a paired test is appropriate:
wilcox.test(CV_knn_MAE,
CV_glmnet_MAE,
paired = TRUE)
If comparing more than two algorithms one can use friedman.test
Which statistical test should I use to determine whether a treatment affects different groups differently?
Assuming your data meets the criteria for parametric tests:
- if you have both, several treatment groups and several measurements: 2-way ANOVA for repeated measurements.
- if several treatment groups and one outcome: 1-way ANOVA
- you can also perform pairwise comparisons with Students t-test
What statistical test should I do for humpback whale singing behavior?
Your dataset is quite large (total n = 4382), so there might be other issues, like dependence ... but concentrating on your data, first, present is as an contingency table :
ExistenceOfSong
daylight.regime nosong song
dark 874 570
light 847 323
twilight 1216 552
As the numbers are large, a chisquared test is highly significant, but that is maybe not of much interest. Rather present the table with percentages of interest (left for you), or as a mosaicplot:
Related Topics
Stop Lapply from Printing to Console
Get the Size of the Window in Shiny
Source Script to Separate Environment in R, Not the Global Environment
Reading a CSV File Organized Horizontally
R 3.3.0 Installing a Package on Windows: Gcc Not Found Error
Replace Specific Values Based on Another Dataframe
Different Colour Palettes for Two Different Colour Aesthetic Mappings in Ggplot2
Add Data to Ggvis Tooltip That's Contained in the Input Dataset But Not Directly in the Vis
How to Make a Timeseries Boxplot in R
Displaying True When Shiny Files Are Split into Different Folders
Dictionary() Is Not Supported Anymore in Tm Package. How to Emend Code
Extend an Irregular Sequence and Add Zeros to Missing Values
Convert a Printed Message into a Character Vector
Adding Percentage Labels on Pie Chart in R
Calculating Sum of Previous 3 Rows in R Data.Table (By Grid-Square)