Obtaining Threshold Values from a Roc Curve

Obtaining threshold values from a ROC curve

This is why str is my favorite R function:

library(ROCR)
data(ROCR.simple)
pred <- prediction( ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf)
> str(perf)
Formal class 'performance' [package "ROCR"] with 6 slots
..@ x.name : chr "False positive rate"
..@ y.name : chr "True positive rate"
..@ alpha.name : chr "Cutoff"
..@ x.values :List of 1
.. ..$ : num [1:201] 0 0 0 0 0.00935 ...
..@ y.values :List of 1
.. ..$ : num [1:201] 0 0.0108 0.0215 0.0323 0.0323 ...
..@ alpha.values:List of 1
.. ..$ : num [1:201] Inf 0.991 0.985 0.985 0.983 ...

Ahah! It's an S4 class, so we can use @ to access the slots. Here's how you make a data.frame:

cutoffs <- data.frame(cut=perf@alpha.values[[1]], fpr=perf@x.values[[1]], 
tpr=perf@y.values[[1]])
> head(cutoffs)
cut fpr tpr
1 Inf 0.000000000 0.00000000
2 0.9910964 0.000000000 0.01075269
3 0.9846673 0.000000000 0.02150538
4 0.9845992 0.000000000 0.03225806
5 0.9834944 0.009345794 0.03225806
6 0.9706413 0.009345794 0.04301075

If you have an fpr threshold you want to hit, you can subset this data.frame to find maximum tpr below this fpr threshold:

cutoffs <- cutoffs[order(cutoffs$tpr, decreasing=TRUE),]
> head(subset(cutoffs, fpr < 0.2))
cut fpr tpr
96 0.5014893 0.1495327 0.8494624
97 0.4997881 0.1588785 0.8494624
98 0.4965132 0.1682243 0.8494624
99 0.4925969 0.1775701 0.8494624
100 0.4917356 0.1869159 0.8494624
101 0.4901199 0.1962617 0.8494624

create ROC from 10 different thresholds

ROCR creates an ROC curve by plotting the TPR and FPR for many different thresholds. This can be done with just one set of predictions and labels because if an observation is classified as positive for one threshold, it will also be classified as positive at a lower threshold. I found this paper to be helpful in explaining ROC curves in more detail.

You can create the plot as follows in ROCR where x is the vector of predictions, and y is the vector of class labels:

pred <- prediction(x,y) 
perf <- performance(pred,"tpr","fpr")
plot(perf)

If you want to access the TPR and FPR associated with all the thresholds, you can examine the performance object 'perf':

str(perf)

The following answer shows how to obtain the threshold values in more detail:

https://stackoverflow.com/a/16347508/786220

How to get the optimal threshold from ROC curve in Python?

def Find_Optimal_Cutoff(target, predicted):
fpr, tpr, threshold = roc_curve(target, predicted)
i = np.arange(len(tpr))
roc = pd.DataFrame({'tf' : pd.Series(tpr-(1-fpr), index=i), 'threshold' : pd.Series(threshold, index=i)})
roc_t = roc.ix[(roc.tf-0).abs().argsort()[:1]]

return list(roc_t['threshold'])

threshold = Find_Optimal_Cutoff(target_column,predicted_column)

Source: Roc curve and cut off point. Python

ROC curves using pROC on R: Calculating lab value a threshold equates to

As you gave no reproducible example let's use the one that comes with the package

library(pROC)
data(aSAH)
roc1 <- roc(aSAH$outcome, aSAH$s100b)

The package comes with the function coords which lists specificity and sensititivity at different thresholds:

> coords(roc1)
threshold specificity sensitivity
1 -Inf 0.00000000 1.00000000
2 0.035 0.00000000 0.97560976
3 0.045 0.06944444 0.97560976
4 0.055 0.11111111 0.97560976
5 0.065 0.13888889 0.97560976
6 0.075 0.22222222 0.90243902
7 0.085 0.30555556 0.87804878
8 0.095 0.38888889 0.82926829
9 0.105 0.48611111 0.78048780
10 0.115 0.54166667 0.75609756
...

From there you can use the function ci.coords that you already have used to complete the table by whatever data you desire.

How do i set different threshold to get multiple values for ROC plot

This is because you are predicting class labels directly. Your predictions probably look like:

table(svm.pred)
pred
class1 class2
28 37

Therefore there is no thresholds to choose from to build the ROC curve.

Try to do a regression instead (in e1071 you'll need to make sure your class labels are numeric):

svm.model <- svm(as.numeric(Class) ~ ., data = training, type="eps-regression", [...])

Finding lower threshold with ROC

Your problem is that the positive class has lower X values. Sklearn assumes higher values for the positive class, otherwise the ROC curve gets inverted, here with an AUC of 0.0:

from sklearn.metrics import roc_auc_score
print(roc_auc_score(Y, X))
# OUTPUT: 0.0

ROC analysis comes from the field of signal detection, and it critically depends on the definition of a positive signal, ie the direction of the comparison. Some libraries can automatically detect that for you, some don't, but in the end it always has to be done.

And so the rest is correct, the "best" threshold in this case is one of the corner of the curve.

Just make sure your positive class is set properly, and you're good to go:

Y = X > 5


Related Topics



Leave a reply



Submit