Factors Ordered VS. Levels

labelling of ordered factor variable

Like many other people, I think you might be misunderstanding the meaning of an "ordered" factor in R. All factors in R are ordered, in a sense; the estimates etc. are typically printed, plotted, etc. in the order of the levels vector. Specifying that a factor is of type ordered has two major effects:

  • it allows you to evaluate inequalities on the levels of the factor (e.g. you can filter(age > "b"))
  • the contrasts are set by default to orthogonal polynomial contrasts, which is where the L (linear) and Q (quadratic) labels come from: see e.g. this CrossValidated answer for more details.

If you want this variable treated in the same way a regular factor (so that the estimates are made for differences of groups from the baseline level, i.e. treatment contrasts), you can:

  • convert back to an unordered factor (e.g. factor(age, ordered=FALSE))
  • specify that you want to use treatment contrasts in your model (in base R you would specify contrasts = list(age = "contr.treatment"))
  • set options(contrasts = c(unordered = "contr.treatment", ordered = "contr.treatment")) (the default for ordered is "contr.poly")

If you have an unordered ("regular") factor and the levels are not in the order you want, you can reset the level order by specifying the levels explicitly, e.g.

mutate(across(age, factor, 
levels = c("0-10 years", "11-20 years", "21-30 years", "30-40 years")))

R sets the factors in alphabetical order by default, which is sometimes not what you want (but I can't think of a case where the order would be 'random' ...)

How does R determine the default level ordering of a factor variable when importing data?

The default ordering is alphabetic

set.seed(24)
v1 <- factor(sample(letters[1:10], 50, replace = TRUE))
levels(v1)

Reorder factor levels by pattern

You can create your desired factor levels programmatically.

lvls <- do.call(paste, c(tidyr::expand_grid(
c('Female', 'Male'), c('18_34', '35_49'), c('HS', 'CG')), sep = '-'))
lvls
#[1] "Female-18_34-HS" "Female-18_34-CG" "Female-35_49-HS" "Female-35_49-CG"
#[5] "Male-18_34-HS" "Male-18_34-CG" "Male-35_49-HS" "Male-35_49-CG"

You can use this lvls as levels in the factor call.

How do I make predictions using an ordered factor coefficient in R?

Just give R a data frame with x values drawn from the levels of the factor ("none", "some", etc.), and it will do the rest.

I changed your setup slightly to change the type of x to ordered() within the data frame (this will carry through all of the computations).

d$x = ordered(d$x, labels=c("none", "some", "more", "a lot"))                                                                                                                                                                               
m1 <- lm(y~x, d) ## save fitted object
Coefs <- coef(m1)

Now we can predict():

predict(m1, newdata =  data.frame(x=c("none","more"))) 
## 1 2
## 2.993959 6.997342

(didn't have to explicitly say that the new x was ordered())

If you want to dig a little bit deeper into the computations, you can look at the model matrix:

model.matrix(~unique(d$x))    

For each level of the factor, these are the values R multiplies the coefficients by to generate the prediction (e.g. for level = "none", 1*b0 + (-0.67)*b1 + 0.5*b2 - 0.223*b3)

   (Intercept) unique(d$x).L unique(d$x).Q unique(d$x).C                                                                   
1 1 -0.6708204 0.5 -0.2236068
2 1 -0.2236068 -0.5 0.6708204
3 1 0.2236068 -0.5 -0.6708204
4 1 0.6708204 0.5 0.2236068

For even more detail, look at ?poly or the source code of poly() (although neither of these is easy!)

change the order of a factor character vector

Please, find below one possible solution using the mixedsort() function from the gtools library.

Reprex

  • Code
library(gtools)

factor(mixedsort(c("AS-GEN-SUM-10-Fall", "AS-GEN-SUM-3-Fall","AS-GEN-SUM-4-Fall","AS-GEN-SUM-5-Fall"), decreasing = TRUE))
  • Output
#> [1] AS-GEN-SUM-3-Fall  AS-GEN-SUM-4-Fall  AS-GEN-SUM-5-Fall  AS-GEN-SUM-10-Fall
#> 4 Levels: AS-GEN-SUM-10-Fall AS-GEN-SUM-3-Fall ... AS-GEN-SUM-5-Fall

Created on 2022-02-11 by the reprex package (v2.0.1)



Related Topics



Leave a reply



Submit