Typeof Returns Integer for Something That Is Clearly a Factor

typeof returns integer for something that is clearly a factor

If what you wanted to know was "what class was held by a vector?" then use class. If you wanted to test "whether a vector was a factor?" then use is.factor.

The value returned by typeof being integer for factors is a language feature that confused me as well in my early days of R programming. The typeof function is giving information that's at a "lower" level of abstraction. Factor variables (and also Dates) are stored as integers. Datetimes are stored as numeric. Learn to use class or str rather than typeof (or mode). They give more useful information. You can look at the full "structure" of a factor variable with dput:

 dput( factor( rep( letters[1:5], 2) ) )
# structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L),
.Label = c("a", "b", "c", "d", "e"), class = "factor")

The character values that are usually thought of as the factor values are actually stored in an attribute (which is what "levels" returns), while the "main" part of the variable is a set of integer indices pointing to various level "attributes"), named .Label, so mode returns "numeric" and typeof returns "integer". For this reason one usually needs to use as.character that will coerce to what most people think of as "factors", namely their character representations.

Odd type of a column in a Data Frame

state column is a factor.

class(incomeAndState$state)
#[1] "factor"

and factors are internally stored as integers hence you see

typeof(incomeAndState$state)
#[1] "integer"

which can also be verified with mode

mode(incomeAndState$state)
#[1] "numeric"

This can be avoided if you use stringsAsFactors = FALSE while constructing the dataframe

incomeAndState <- data.frame(state=state, income=income, stringsAsFactors = FALSE)

which will give you

typeof(incomeAndState$state)
#[1] "character"

mode(incomeAndState$state)
#[1] "character"

Data type double not converting to factor

The issue here is that typeof checks the internal representation of an object. Factors are represented as integers. To check that something is actually a factor, use is.factor instead. From the docs:

typeof determines the (R internal) type or storage mode of any object

To verify this "claim", you can check the well known iris Species' column which is a factor. typeof(iris$Species) will however return integer because to R factors are integers.

Using is.factor is a better option, this ultimately boils down to the difference between types and classes in R.

is.factor(iris$Species)
[1] TRUE

Return factor level from a function, not an integer in R

You can use the levels of the variable Classes and the output of the ifelse statement as follows:

data <- data.frame(a = 1:10)

find_class <- function(i) {

classes <- factor(c('A', 'B', 'C'))

idx <- ifelse(i %in% c(1, 3, 5), classes[1],
ifelse(i %in% c(2, 4, 9), classes[2], classes[3]))

res <- levels(classes)[idx]
factor(res, levels(classes))
}

data$class <- find_class(data$a)

data$class
# [1] A B A B A C C C B C
# Levels: A B C

data
# a class
# 1 1 A
# 2 2 B
# 3 3 A
# 4 4 B
# 5 5 A
# 6 6 C
# 7 7 C
# 8 8 C
# 9 9 B
# 10 10 C

Check whether a factor variable is of type integer or float

Using the following code you can check whether the data are of type integer or not after being converted to a numeric format :

all.equal(as.numeric(levels(data_rating$rating)), 
as.integer(as.numeric(levels(data_rating$rating)))) == TRUE

If that operation returns TRUE then it is an integer, if it returns FALSE it is a float.

R Remove an integer element from a factor-type vector

We could use !is.na(as.numeric()) to identify the strings that are numeric and remove them.

onlynumbers <- "123.4"
onlyletters <- "abcd."
strings <- c(onlynumbers, onlyletters)
!is.na(as.numeric(strings))
[1] TRUE FALSE

As you can see this is working, now the removal

result <- strings[is.na(as.numeric(strings))]
> result
[1] "abcd."

EDIT You should first convert your factors to character using as.character.factor and after you can reconvert using as.factor


EDIT 2 to keep the names you could use names(result) <- names(strings)[is.na(as.numeric(strings))]



Related Topics



Leave a reply



Submit