Classification functions in linear discriminant analysis in R
There isn't a built-in way to get the information I needed, so I wrote a function to do it:
ty.lda <- function(x, groups){
x.lda <- lda(groups ~ ., as.data.frame(x))
gr <- length(unique(groups)) ## groups might be factors or numeric
v <- ncol(x) ## variables
m <- x.lda$means ## group means
w <- array(NA, dim = c(v, v, gr))
for(i in 1:gr){
tmp <- scale(subset(x, groups == unique(groups)[i]), scale = FALSE)
w[,,i] <- t(tmp) %*% tmp
}
W <- w[,,1]
for(i in 2:gr)
W <- W + w[,,i]
V <- W/(nrow(x) - gr)
iV <- solve(V)
class.funs <- matrix(NA, nrow = v + 1, ncol = gr)
colnames(class.funs) <- paste("group", 1:gr, sep=".")
rownames(class.funs) <- c("constant", paste("var", 1:v, sep = "."))
for(i in 1:gr) {
class.funs[1, i] <- -0.5 * t(m[i,]) %*% iV %*% (m[i,])
class.funs[2:(v+1) ,i] <- iV %*% (m[i,])
}
x.lda$class.funs <- class.funs
return(x.lda)
}
This code follows the formulas in Legendre and Legendre's Numerical Ecology (1998), page 625, and matches the results of the worked example starting on page 626.
Calculating linear discriminant classification function scores for each row in new test data
Using the sample data from the second website you listed, I was able to run
wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", sep=",")
wine.lda <- lda(V1 ~ V2 + V3 + V4 + V5 + V6 + V7 +
V8 + V9 + V10 + V11 + V12 + V13 + V14, wine)
#create "new" data
ss<-aggregate(.~V1, wine, mean)[-1]
#predict on new data
predict(wine.lda, ss)
So I think the problems is how you specified your model (or really, the names of the covariates of the model). I think that predict will check to make sure that
attr(wine.lda$terms,"term.labels") == names(ss)
And it is likely that all the terms in your lda model have the "ref$" part so they won't match up to your new data. I don't know why they have that awful example of formula notation on that guide. I would recommend doing as I did above. Taking off the data.frame name from each of the terms and supplying the data.frame as the second parameter. This should make it possible to match up the names with new data.
Simple discriminant analysis in R using the lda function fails
The problem is that your object prior
is of class table
, but lda
needs your priors to be a vector
.
A simple workaround is to use as.vector
on the results of table
prior <- as.vector(counts / sum(counts))
z <- lda(categories ~ values, dat, prior=prior)
predict(z, data)$class
[1] 1 1 2 1 1 1 1 2 2 1 2 3 3 2 2 2 2 2 1 2 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 1 3 1
[39] 2 3 3 2 1 1 2 1 1 3 3 3 1 1 2 1 2 1 2 1 2 1 1 1 2 1 2 2 2 2 3 3 2 3 3 3 3 2
[77] 3 3 2 3 2 1 1 1 1 1 3 1 3 3 3 3
Levels: 1 2 3
Related Topics
R: Apply Function to Matrix with Elements of Vector as Argument
Is There More Efficient or Concise Way to Use Tidyr::Gather to Make My Data Look 'Tidy'
Removing/Replacing Brackets from R String Using Gsub
Barplot with Multiple Columns in R
Get Country (And Continent) from Longitude and Latitude Point in R
Grouped Bar Chart on R Using Ggplot2
Piecewise Function Fitting with Nls() in R
Change Plot Panel in Multipanel Plot in R
Creating a Table with Individual Trials from a Frequency Table in R (Inverse of Table Function)
Convert to Local Time Zone Using Latitude and Longitude
Robust Standard Errors for Mixed-Effects Models in Lme4 Package of R
How to Fix Axis Margin with Ggplot2
Manually Defining The Colours of a Wireframe
Changing The Radius of a Coord_Polar Ggplot
R Shiny: How to Change The Background Color of The Header