Getting the Error "Level Sets of Factors Are Different" When Running a for Loop

Getting the error level sets of factors are different when running a for loop

FinalTable[2,1] and breakVector[1,1] do not have the same levels:

> FinalTable[2,1]
[1] Brand
Levels: Brand NonBrand
> breakVector[1,1]
[1] NonBrand
Levels: NonBrand

This is easily fixed by using

breakVector[,1] <- factor(breakVector[,1], levels=c("Brand", "NonBrand"))

or, more generally

breakVector[,1] <- factor(breakVector[,1], levels=levels(FinalTable[,1]))

How can I compare two factors with different levels?

Convert to character then compare:

# data
A <- factor(1:5)
B <- factor(c(1:3,6,6))

str(A)
# Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5
str(B)
# Factor w/ 4 levels "1","2","3","6": 1 2 3 4 4

mean(A == B)

Error in Ops.factor(A, B) : level sets of factors are different

mean(as.character(A) == as.character(B))
# [1] 0.6

Or another approach would be

mean(levels(A)[A] == levels(B)[B])

which is 2 times slower on a 1e8 dataset.

Error when using mean() in R: Error in Ops.factor(obs, pred) : level sets of factors are different

How about just doing as follows without converting to factors,

mean(test_accuracy$observed == test_accuracy$new_predicted)
# 0.5069767

Mapply statement is returning incorrect value. What is happening?

Up front:

findAdv <- function(t1, t2) {
return(typechart[as.character(t1), as.character(t2)])
}

factors are your problem here.

findAdv <- function(t1, t2) {
browser()
return(typechart[t1, t2])
}

> mapply(findAdv, battles$winner_type.1, battles$loser_type.1)
Called from: (function(t1, t2) {
browser()
return(typechart[t1, t2])
...
Browse[1]> debug at #3: return(typechart[t1, t2])
Browse[2]> t1
[1] Grass
Levels: Dragon Grass Psychic Rock
Browse[2]> t2
[1] Rock
Levels: Bug Fairy Fire Grass Rock
Browse[2]> c(t1,t2)
[1] 2 5 # <--- here's a hint
Browse[2]> typechart[t1, t2]
[1] 2
Browse[2]> typechart[as.character(t1), as.character(t2)]
[1] 2

At this point, realize that typechart[2,5] (which is not Grass/Rock) is the same value as typechart["Grass","Rock"], but it is not the same location in the typechart matrix. Coincidence. Let's go to the second iteration by pressing continue.

Browse[2]> c
Browse[2]> Called from: (function(t1, t2) {
browser()
return(typechart[t1, t2])
...
Browse[1]> debug at #3: return(typechart[t1, t2])
Browse[2]> c(t1,t2)
[1] 4 4
Browse[2]> typechart[t1, t2]
[1] 0.5
Browse[2]> typechart[as.character(t1), as.character(t2)]
Error in typechart[as.character(t1), as.character(t2)] (from #3) :
subscript out of bounds
x
1. \-base::mapply(findAdv, battles$winner_type.1, battles$loser_type.1)
2. \-(function (t1, t2) ...
Browse[2]> c(as.character(t1), as.character(t2))
[1] "Rock" "Grass"

It is producing an error because your sample data is incomplete: while we have at least 4 rows and 4 columns (when using the integer format of the factors), we do not have a row for "Rock".

In your data, I would expect that this would not produce an error, but would instead give you a different lookup value.


Data:

battles <- structure(list(First_pokemon = structure(c(1L, 5L, 4L, 3L, 2L), .Label = c("Larvitar", "Omastar", "Slugma", "Togetic", "Virizion"), class = "factor"), Second_pokemon = structure(c(3L, 5L, 1L, 2L, 4L), .Label = c("Beheeyem", "Druddigon", "Nuzleaf", "Shuckle", "Terrakion"), class = "factor"), Winner = structure(c(3L, 5L, 1L, 2L, 4L), .Label = c("Beheeyem", "Druddigon", "Nuzleaf", "Omastar", "Terrakion"), class = "factor"), Loser = structure(c(1L, 5L, 4L, 3L, 2L), .Label = c("Larvitar", "Shuckle", "Slugma", "Togetic", "Virizion"), class = "factor"), diff_hp = c(20L, 0L, 20L, 37L, 50L), diff_att = c(6L, 39L, 35L, 80L, 50L), diff_def = c(-10L, 18L, -10L, 50L, -105L), diff_sp.att = c(15L, -18L, 45L, -10L, 105L), diff_sp.def = c(-10L, -39L, -10L, 50L, -160L), diff_speed = c(19L, 0L, 0L, 28L, 50L), winner_type.1 = structure(c(2L, 4L, 3L, 1L, 4L), .Label = c("Dragon", "Grass", "Psychic", "Rock"), class = "factor"),     winner_type.2 = structure(c(2L, 3L, 1L, 1L, 4L), .Label = c(".",     "Dark", "Fighting", "Water"), class = "factor"), loser_type.1 = structure(c(5L,     4L, 2L, 3L, 1L), .Label = c("Bug", "Fairy", "Fire", "Grass",     "Rock"), class = "factor"), loser_type.2 = structure(c(4L,     2L, 3L, 1L, 5L), .Label = c(".", "Fighting", "Flying", "Ground",     "Rock"), class = "factor")), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
typechart <- structure(c(1, 1, 1, 1, 1, 1, 1, 0.5, 2, 1, 0.5, 0.5, 1, 0.5, 0.5, 2, 2, 0.5, 1, 1, 1, 0.5, 1, 1, 1, 2, 0.5, 0.5, 0.5, 2, 1, 2, 1, 1, 1, 0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.5, 1, 1, 1, 2, 0, 2, 2, 1, 1, 1, 2, 0.5, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 0.5, 1, 0.5, 0.5, 2, 1, 2, 1, 0, 1, 1, 1, 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 2, 1, 1, 1, 1, 1, 1, 0.5, 2, 1, 1, 0.5, 0.5, 1, 1, 1, 1, 1, 1), .Dim = c(6L, 18L), .Dimnames = list(c("Normal", "Fire", "Water", "Electric", "Grass", "Ice"), c("Normal", "Fire", "Water", "Electric", "Grass", "Ice", "Fighting", "Poison", "Ground", "Flying", "Psychic", "Bug", "Rock", "Ghost", "Dragon", "Dark", "Steel", "Fairy")))

Factor has new levels error for variable I'm not using

You could try updating mod2$xlevels[["y"]] in the model object

mod2 <- glm(z~.-y, data=train, family="binomial")
mod2$xlevels[["y"]] <- union(mod2$xlevels[["y"]], levels(test$y))

predict(mod2, newdata=test, type="response")
# 5
#0.5546394

Another option would be to exclude (but not remove) "y" from the training data

mod2 <- glm(z~., data=train[,!colnames(train) %in% c("y")], family="binomial")
predict(mod2, newdata=test, type="response")
# 5
#0.5546394


Related Topics



Leave a reply



Submit