Error when trying to get class probabilities in R's caret package
There are a couple of issues.
First, this approach requires that the class levels of the factor follow the convention of valid
R variable names, so renaming the levels of the carb factor to start with a letter is the first step
mtcars$carb <- as.factor(paste0("c",mtcars$carb))
Second, the default argument of classProbs in TrainControl is set to FALSE
.
This should be TRUE
in your case.
library("caret")
tuneGrid <- expand.grid(mtry = c(10), min.node.size = c(1), splitrule = "extratrees")
rf_model <- train(carb ~ ., data = mtcars, method = "ranger",
trControl = trainControl(method = "none", classProbs = TRUE),
tuneGrid = tuneGrid)
classprobs <- predict(rf_model, newdata = mtcars, type = "prob")
support vector machine train caret error kernlab class probability calculations failed; returning NAs
In the train control statement, you have to specify if you want the class probabilities classProbs = TRUE
returned.
svmFit <- train(class ~ .,
data = trainset,
method = "svmRadial",
preProc = c("center", "scale"),
tuneGrid = svmTuneGrid,
trControl = trainControl(method = "repeatedcv", repeats = 5,
classProbs = TRUE))
predictedClasses <- predict(svmFit, testset )
predictedProbs <- predict(svmFit, newdata = testset , type = "prob")
giving the probabilities of being in the Bad or Good class in the test dataset as:
print(predictedProbs)
Bad Good
1 0.2302979 0.7697021
2 0.7135050 0.2864950
3 0.2230889 0.7769111
EDIT
To answer your new question, you can access the position of the support vectors in your original data set with alphaindex(svmFit$finalModel)
with coefficients coef(svmFit$finalModel)
.
Error when using predict() function on caret models in R
You can use the following code
log_class_predictions <- predict(logistic_model, new_data = pd_test)
log_predictions <- predict(logistic_model, new_data = pd_test, type = "prob")
How to get the class probabilities AND predictions from caret::predict()?
I make my comment into an answer.
Once you generate your prediction table of probabilities, you don't actually need to run twice the prediction function to get the classes. You can ask to add the class column by applying a simple which.max
function (which runs fast imo). This will assign for each row the name of the column (one in the three c("setosa", "versicolor", "virginica")
) based on which probability is the highest.
You get this table with both informations, as requested:
library(dplyr)
predict(knnFit, newdata = iris, type = "prob") %>%
mutate('class'=names(.)[apply(., 1, which.max)])
# a random sample of the resulting table:
#### setosa versicolor virginica class
#### 18 1 0.0000000 0.0000000 setosa
#### 64 0 0.6666667 0.3333333 versicolor
#### 90 0 1.0000000 0.0000000 versicolor
#### 121 0 0.0000000 1.0000000 virginica
ps: this uses the piping operator from dplyr
or magrittr
packages. The dot .
indicates when you reuse the result from the previous instruction
Predict function from Caret package give an Error
Show us str(train)
and str(test)
. I suspect the outcome variable is numeric, which makes train
think that you are doing regression. That should also be apparent from printing model
. Make it a factor if you want to do classification.
Max
Error using Caret Package for Random Forest (Regression)
You've specified classProbs=T
in trainControl
, which indicates class probabilities should be computed for a classification model (where the response variable consists of discrete class labels). However, that argument setting conflicts with your numeric response variable (which indicates a regression model will be trained), resulting in the error message that class probabilities cannot be computed for regression.
Since your description and numeric response variable indicate this is a regression problem, removing classProbs=T
(the default setting is classProbs=F
) from your code should address the error you're getting.
Caret and KNN in R: predict function gives error
The problem is your y variable. When you are asking for the class probabilities, the train and / or the predict function puts them into a data frame with a column for each class. If the factor levels are not valid variable names, they are automatically changed (e.g. "0" becomes "X0"). See also this post.
If you change this line in your code it should work:
a[,1] = factor(a[,1], labels = c("no", "yes"))
Related Topics
R: Robust Se's and Model Diagnostics in Stargazer Table
How to Call External R Script from R Markdown (.Rmd) in Rstudio
Factor Order Within Faceted Dotplot Using Ggplot2
Create a 24 Hour Vector with 5 Minutes Time Interval in R
Ggplot2 Avoid Boxes Around Legend Symbols
Remove Fill Around Legend Key in Ggplot
Changing Font Size in R Datatables (Dt)
How to Remove Unique Entry and Keep Duplicates in R
How to Use Earlier Declared Variables Within Aes in Ggplot with Special Operators (..Count.., etc.)
Reading Excel File: How to Find the Start Cell in Messy Spreadsheets
Defer Code to End of Document in Knitr
Plot Logistic Regression Curve in R
R: Find and Add Missing (/Non Existing) Rows in Time Related Data Frame
Delete Columns Where All Values Are 0
Identify Points Within Specified Distance in R
How to Create Datatable with Complex Header in R Shiny
How to Do a Regression of a Series of Variables Without Typing Each Variable Name