How to Get Factor Matrices in R

Can we get factor matrices in R?

In this case, it may walk like a duck and even quack like a duck, but f from:

f <- factor(sample(letters[1:5], 20, rep=TRUE), letters[1:5])
dim(f) <- c(4,5)

really isn't a matrix, even though is.matrix() claims that it strictly is one. To be a matrix as far as is.matrix() is concerned, f only needs to be a vector and have a dim attribute. By adding the attribute to f you pass the test. As you have seen, however, once you start using f as a matrix, it quickly loses the features that make it a factor (you end up working with the levels or the dimensions get lost).

There are really only matrices and arrays for the atomic vector types:

  1. logical,
  2. integer,
  3. real,
  4. complex,
  5. string (or character), and
  6. raw

plus, as @hadley reminds me, you can also have list matrices and arrays (by setting the dim attribute on a list object. See, for example, the Matrices & Arrays section of Hadley's book, Advanced R.)

Anything outside those types would be coerced to some lower type via as.vector(). This happens in matrix(f, nrow = 3) not because f is atomic according to is.atomic() (which returns TRUE for f because it is internally stored as an integer and typeof(f) returns "integer"), but because it has a class attribute. This sets the OBJECT bit on the internal representation of f and anything that has a class is supposed to be coerced to one of the atomic types via as.vector():

matrix <- function(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
dimnames = NULL) {
if (is.object(data) || !is.atomic(data))
data <- as.vector(data)
....

Adding dimensions via dim<-() is a quick way to create an array without duplicating the object, but this bypasses some of the checks and balances that R would do if you coerced f to a matrix via the other methods

matrix(f, nrow = 3) # or
as.matrix(f)

This gets found out when you try to use basic functions that work on matrices or use method dispatch. Note that after assigning dimensions to f, f still is of class "factor":

> class(f)
[1] "factor"

which explains the head() behaviour; you are not getting the head.matrix behaviour because f is not a matrix, at least as far as the S3 mechanism is concerned:

> debug(head.matrix)
> head(f) # we don't enter the debugger
[1] d c a d b d
Levels: a b c d e
> undebug(head.matrix)

and the head.default method calls [ for which there is a factor method, and hence the observed behaviour:

> debugonce(`[.factor`)
> head(f)
debugging in: `[.factor`(x, seq_len(n))
debug: {
y <- NextMethod("[")
attr(y, "contrasts") <- attr(x, "contrasts")
attr(y, "levels") <- attr(x, "levels")
class(y) <- oldClass(x)
lev <- levels(x)
if (drop)
factor(y, exclude = if (anyNA(levels(x)))
NULL
else NA)
else y
}
....

The cbind() behaviour can be explained from the documented behaviour (from ?cbind, emphasis mine):

The functions cbind and rbind are S3 generic, ...

....

In the default method, all the vectors/matrices must be atomic
(see vector) or lists. Expressions are not allowed. Language
objects (such as formulae and calls) and pairlists will be coerced
to lists: other objects (such as names and external pointers) will
be included as elements in a list result. Any classes the inputs
might have are discarded (in particular, factors are replaced by
their internal codes).

Again, the fact that f is of class "factor" is defeating you because the default cbind method will get called and it will strip the levels information and return the internal integer codes as you observed.

In many respects, you have to ignore or at least not fully trust what the is.foo functions tell you, because they are just using simple tests to say whether something is or is not a foo object. is.matrix() and is.atomic() are clearly wrong when it comes to f (with dimensions) from a particular point of view. They are also right in terms of their implementation or at least their behaviour can be understood from the implementation; I think is.atomic(f) is not correct, but if by "if is of an atomic type" R Core mean "type" to be the thing returned by typeof(f) then is.atomic() is right. A more strict test is is.vector(), which f fails:

> is.vector(f)
[1] FALSE

because it has attributes beyond a names attribute:

> attributes(f)
$levels
[1] "a" "b" "c" "d" "e"

$class
[1] "factor"

$dim
[1] 4 5

As to how should you get a factor matrix, well you can't, at least if you want it to retain the factor information (the labels for the levels). One solution would be to use a character matrix, which would retain the labels:

> fl <- levels(f)
> fm <- matrix(f, ncol = 5)
> fm
[,1] [,2] [,3] [,4] [,5]
[1,] "c" "a" "a" "c" "b"
[2,] "d" "b" "d" "b" "a"
[3,] "e" "e" "e" "c" "e"
[4,] "a" "b" "b" "a" "e"

and we store the levels of f for future use incase we lose some elements of the matrix along the way.

Or work with the internal integer representation:

> (fm2 <- matrix(unclass(f), ncol = 5))
[,1] [,2] [,3] [,4] [,5]
[1,] 3 1 1 3 2
[2,] 4 2 4 2 1
[3,] 5 5 5 3 5
[4,] 1 2 2 1 5

and you can always get back to the levels/labels again via:

> fm2[] <- fl[fm2]
> fm2
[,1] [,2] [,3] [,4] [,5]
[1,] "c" "a" "a" "c" "b"
[2,] "d" "b" "d" "b" "a"
[3,] "e" "e" "e" "c" "e"
[4,] "a" "b" "b" "a" "e"

Using a data frame would seem to be not ideal for this as each component of the data frame would be treated as a separate factor whereas you seem to want to treat the array as a single factor with one set of levels.

If you really wanted to do what you want, which is have a factor matrix, you would most likely need to create your own S3 class to do this, plus all the methods to go with it. For example, you might store the factor matrix as a character matrix but with class "factorMatrix", where you stored the levels alongside the factor matrix as an extra attribute say. Then you would need to write [.factorMatrix, which would grab the levels, then use the default [ method on the matrix, and then add the levels attribute back on again. You could write cbindand head methods as well. The list of required method would grow quickly however, but a simple implementation may suit and if you make your objects have class c("factorMatrix", "matrix") (i.e inherit from the "matrix" class), you'll pick up all the properties/methods of the "matrix" class (which will drop the levels and other attributes) so you can at least work with the objects and see where you need to add new methods to fill out the behaviour of the class.

Create Matrix of factors from data frame in R

Maybe this will get you what you need?

mymatrix = matrix(mydata, ncol = 2)
str(mymatrix)

gives you

List of 2
$ : Factor w/ 2 levels "no","yes": 2 2 1 1
$ : Factor w/ 2 levels "no","yes": 2 1 1 2
- attr(*, "dim")= int [1:2] 1 2

You would need to explain a bit more what you want to do to get more precise help.

Convert matrix from character to factor

matrix holds only one data type. Factor is a complex data type made up of character and integer types. Matrix cannot hold two types at a time. List is the appropriate data structure for factor. Data.frame is a kind of list data structure.

The help documentation of matrix ?matrix states that

an optional data vector (including a list or expression
vector). Non-atomic classed R objects are coerced by as.vector and all
attributes discarded.

The attributes for a factor is shown below.

attributes(factor(letters[1:4]))
$levels
[1] "a" "b" "c" "d"

$class
[1] "factor"

These attributes are removed using as.vector during matrix formation.

attributes(as.vector(factor(letters[1:4])))
NULL

Proportions for factor columns in matrix in R

You can use apply which works better with matrices using MARGIN = 2 for columns.

apply(mtx, 2, function(x) prop.table(table(factor(x, levels = wordclass))))

# [,1] [,2] [,3] [,4]
#Content 0.375 0.625 0.25 0.50
#Function 0.250 0.250 0.50 0.25
#Insert 0.375 0.125 0.25 0.25

R - Multiply the entire column in R to a list, based on a factor column

Here is an idea. Notice that I set stringsAsFactors = FALSE because it is easier to work with character vector directly.

dat<-data.frame(num = 20:29, 
names = c(rep("Harry",2), rep("Gary",2), rep("Dairy",3), rep("Harry", 3)),
stringsAsFactors = FALSE)

fvals <- c(Harry = 1, Gary = 2, Dairy = 3)

dat$num * fvals[dat$names]
# Harry Harry Gary Gary Dairy Dairy Dairy Harry Harry Harry
# 20 21 44 46 72 75 78 27 28 29

Inserting factor values into an R matrix

as.character() will coerce your factor into a character string like you want.

That said, providing example code would be helpful. Try using dput() on your matrix object and copying and pasting the results into your post. You're likely using a data.frame not a matrix, since I believe matrices can only hold one data type.



Related Topics



Leave a reply



Submit