Getting the error level sets of factors are different when running a for loop
FinalTable[2,1]
and breakVector[1,1]
do not have the same levels:
> FinalTable[2,1]
[1] Brand
Levels: Brand NonBrand
> breakVector[1,1]
[1] NonBrand
Levels: NonBrand
This is easily fixed by using
breakVector[,1] <- factor(breakVector[,1], levels=c("Brand", "NonBrand"))
or, more generally
breakVector[,1] <- factor(breakVector[,1], levels=levels(FinalTable[,1]))
How can I compare two factors with different levels?
Convert to character then compare:
# data
A <- factor(1:5)
B <- factor(c(1:3,6,6))
str(A)
# Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5
str(B)
# Factor w/ 4 levels "1","2","3","6": 1 2 3 4 4
mean(A == B)
Error in Ops.factor(A, B) : level sets of factors are different
mean(as.character(A) == as.character(B))
# [1] 0.6
Or another approach would be
mean(levels(A)[A] == levels(B)[B])
which is 2 times slower on a 1e8 dataset.
Error when using mean() in R: Error in Ops.factor(obs, pred) : level sets of factors are different
How about just doing as follows without converting to factors,
mean(test_accuracy$observed == test_accuracy$new_predicted)
# 0.5069767
Mapply statement is returning incorrect value. What is happening?
Up front:
findAdv <- function(t1, t2) {
return(typechart[as.character(t1), as.character(t2)])
}
factor
s are your problem here.
findAdv <- function(t1, t2) {
browser()
return(typechart[t1, t2])
}
> mapply(findAdv, battles$winner_type.1, battles$loser_type.1)
Called from: (function(t1, t2) {
browser()
return(typechart[t1, t2])
...
Browse[1]> debug at #3: return(typechart[t1, t2])
Browse[2]> t1
[1] Grass
Levels: Dragon Grass Psychic Rock
Browse[2]> t2
[1] Rock
Levels: Bug Fairy Fire Grass Rock
Browse[2]> c(t1,t2)
[1] 2 5 # <--- here's a hint
Browse[2]> typechart[t1, t2]
[1] 2
Browse[2]> typechart[as.character(t1), as.character(t2)]
[1] 2
At this point, realize that typechart[2,5]
(which is not Grass/Rock) is the same value as typechart["Grass","Rock"]
, but it is not the same location in the typechart
matrix. Coincidence. Let's go to the second iteration by pressing c
ontinue.
Browse[2]> c
Browse[2]> Called from: (function(t1, t2) {
browser()
return(typechart[t1, t2])
...
Browse[1]> debug at #3: return(typechart[t1, t2])
Browse[2]> c(t1,t2)
[1] 4 4
Browse[2]> typechart[t1, t2]
[1] 0.5
Browse[2]> typechart[as.character(t1), as.character(t2)]
Error in typechart[as.character(t1), as.character(t2)] (from #3) :
subscript out of bounds
x
1. \-base::mapply(findAdv, battles$winner_type.1, battles$loser_type.1)
2. \-(function (t1, t2) ...
Browse[2]> c(as.character(t1), as.character(t2))
[1] "Rock" "Grass"
It is producing an error because your sample data is incomplete: while we have at least 4 rows and 4 columns (when using the integer format of the factor
s), we do not have a row for "Rock"
.
In your data, I would expect that this would not produce an error, but would instead give you a different lookup value.
Data:
battles <- structure(list(First_pokemon = structure(c(1L, 5L, 4L, 3L, 2L), .Label = c("Larvitar", "Omastar", "Slugma", "Togetic", "Virizion"), class = "factor"), Second_pokemon = structure(c(3L, 5L, 1L, 2L, 4L), .Label = c("Beheeyem", "Druddigon", "Nuzleaf", "Shuckle", "Terrakion"), class = "factor"), Winner = structure(c(3L, 5L, 1L, 2L, 4L), .Label = c("Beheeyem", "Druddigon", "Nuzleaf", "Omastar", "Terrakion"), class = "factor"), Loser = structure(c(1L, 5L, 4L, 3L, 2L), .Label = c("Larvitar", "Shuckle", "Slugma", "Togetic", "Virizion"), class = "factor"), diff_hp = c(20L, 0L, 20L, 37L, 50L), diff_att = c(6L, 39L, 35L, 80L, 50L), diff_def = c(-10L, 18L, -10L, 50L, -105L), diff_sp.att = c(15L, -18L, 45L, -10L, 105L), diff_sp.def = c(-10L, -39L, -10L, 50L, -160L), diff_speed = c(19L, 0L, 0L, 28L, 50L), winner_type.1 = structure(c(2L, 4L, 3L, 1L, 4L), .Label = c("Dragon", "Grass", "Psychic", "Rock"), class = "factor"), winner_type.2 = structure(c(2L, 3L, 1L, 1L, 4L), .Label = c(".", "Dark", "Fighting", "Water"), class = "factor"), loser_type.1 = structure(c(5L, 4L, 2L, 3L, 1L), .Label = c("Bug", "Fairy", "Fire", "Grass", "Rock"), class = "factor"), loser_type.2 = structure(c(4L, 2L, 3L, 1L, 5L), .Label = c(".", "Fighting", "Flying", "Ground", "Rock"), class = "factor")), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
typechart <- structure(c(1, 1, 1, 1, 1, 1, 1, 0.5, 2, 1, 0.5, 0.5, 1, 0.5, 0.5, 2, 2, 0.5, 1, 1, 1, 0.5, 1, 1, 1, 2, 0.5, 0.5, 0.5, 2, 1, 2, 1, 1, 1, 0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.5, 1, 1, 1, 2, 0, 2, 2, 1, 1, 1, 2, 0.5, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 0.5, 1, 0.5, 0.5, 2, 1, 2, 1, 0, 1, 1, 1, 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 2, 1, 1, 1, 1, 1, 1, 0.5, 2, 1, 1, 0.5, 0.5, 1, 1, 1, 1, 1, 1), .Dim = c(6L, 18L), .Dimnames = list(c("Normal", "Fire", "Water", "Electric", "Grass", "Ice"), c("Normal", "Fire", "Water", "Electric", "Grass", "Ice", "Fighting", "Poison", "Ground", "Flying", "Psychic", "Bug", "Rock", "Ghost", "Dragon", "Dark", "Steel", "Fairy")))
Factor has new levels error for variable I'm not using
You could try updating mod2$xlevels[["y"]]
in the model object
mod2 <- glm(z~.-y, data=train, family="binomial")
mod2$xlevels[["y"]] <- union(mod2$xlevels[["y"]], levels(test$y))
predict(mod2, newdata=test, type="response")
# 5
#0.5546394
Another option would be to exclude (but not remove) "y" from the training data
mod2 <- glm(z~., data=train[,!colnames(train) %in% c("y")], family="binomial")
predict(mod2, newdata=test, type="response")
# 5
#0.5546394
Related Topics
R 3.5 Is Not Available for Linux
Fill in Data Frame with Values from Rows Above
What Is the Internal Implementation of Lists
Package 'Pbkrtest' Is Not Available (For R Version 3.2.2)
Constructing a Named List Without Having to Type Each Object's Name Twice
R: Saving Ggplot2 Plots in a List
Weighted Means by Group and Column
Removing Traces by Name Using Plotlyproxy (Or Accessing Output Schema in Reactive Context)
Solving a System of Nonlinear Equations in R
Pivot_Wider, Count Number of Occurrences
Passing Arguments into Multiple Match_Fun Functions in R Fuzzyjoin::Fuzzy_Join
How to Plot Pie Charts in Haplonet Haplotype Networks {Pegas}
Display Frequency Instead of Count with Geom_Bar() in Ggplot
R: Why Kable Doesn't Print Inside a for Loop
How to Always Display 3 Decimal Places in Datatables in R Shiny
What's the Easiest Way to Deploy an API Incorporating R Functions