Classification Functions in Linear Discriminant Analysis in R

Classification functions in linear discriminant analysis in R

There isn't a built-in way to get the information I needed, so I wrote a function to do it:

ty.lda <- function(x, groups){
  x.lda <- lda(groups ~ ., as.data.frame(x))

  gr <- length(unique(groups))   ## groups might be factors or numeric
  v <- ncol(x) ## variables
  m <- x.lda$means ## group means

  w <- array(NA, dim = c(v, v, gr))

  for(i in 1:gr){
    tmp <- scale(subset(x, groups == unique(groups)[i]), scale = FALSE)
    w[,,i] <- t(tmp) %*% tmp
  }

  W <- w[,,1]
  for(i in 2:gr)
    W <- W + w[,,i]

  V <- W/(nrow(x) - gr)
  iV <- solve(V)

  class.funs <- matrix(NA, nrow = v + 1, ncol = gr)
  colnames(class.funs) <- paste("group", 1:gr, sep=".")
  rownames(class.funs) <- c("constant", paste("var", 1:v, sep = "."))

  for(i in 1:gr) {
    class.funs[1, i] <- -0.5 * t(m[i,]) %*% iV %*% (m[i,])
    class.funs[2:(v+1) ,i] <- iV %*% (m[i,])
  }

  x.lda$class.funs <- class.funs

  return(x.lda)
}

This code follows the formulas in Legendre and Legendre's Numerical Ecology (1998), page 625, and matches the results of the worked example starting on page 626.

Calculating linear discriminant classification function scores for each row in new test data

Using the sample data from the second website you listed, I was able to run

wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", sep=",")
wine.lda <- lda(V1 ~ V2 + V3 + V4 + V5 + V6 + V7 +
    V8 + V9 + V10 + V11 + V12 + V13 + V14, wine)

#create "new" data                                
ss<-aggregate(.~V1, wine, mean)[-1]

#predict on new data
predict(wine.lda, ss)

So I think the problems is how you specified your model (or really, the names of the covariates of the model). I think that predict will check to make sure that

attr(wine.lda$terms,"term.labels") == names(ss)

And it is likely that all the terms in your lda model have the "ref$" part so they won't match up to your new data. I don't know why they have that awful example of formula notation on that guide. I would recommend doing as I did above. Taking off the data.frame name from each of the terms and supplying the data.frame as the second parameter. This should make it possible to match up the names with new data.

Simple discriminant analysis in R using the lda function fails

The problem is that your object prior is of class table, but lda needs your priors to be a vector.

A simple workaround is to use as.vector on the results of table

prior <- as.vector(counts / sum(counts))

z <- lda(categories ~ values, dat, prior=prior)
predict(z, data)$class

 [1] 1 1 2 1 1 1 1 2 2 1 2 3 3 2 2 2 2 2 1 2 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 1 3 1
[39] 2 3 3 2 1 1 2 1 1 3 3 3 1 1 2 1 2 1 2 1 2 1 1 1 2 1 2 2 2 2 3 3 2 3 3 3 3 2
[77] 3 3 2 3 2 1 1 1 1 1 3 1 3 3 3 3
Levels: 1 2 3

Classification Functions in Linear Discriminant Analysis in R