labelling of ordered factor variable
Like many other people, I think you might be misunderstanding the meaning of an "ordered" factor in R. All factors in R are ordered, in a sense; the estimates etc. are typically printed, plotted, etc. in the order of the levels
vector. Specifying that a factor is of type ordered
has two major effects:
- it allows you to evaluate inequalities on the levels of the factor (e.g. you can
filter(age > "b")
) - the contrasts are set by default to orthogonal polynomial contrasts, which is where the
L
(linear) andQ
(quadratic) labels come from: see e.g. this CrossValidated answer for more details.
If you want this variable treated in the same way a regular factor (so that the estimates are made for differences of groups from the baseline level, i.e. treatment contrasts), you can:
- convert back to an unordered factor (e.g.
factor(age, ordered=FALSE)
) - specify that you want to use treatment contrasts in your model (in base R you would specify
contrasts = list(age = "contr.treatment")
) - set
options(contrasts = c(unordered = "contr.treatment", ordered = "contr.treatment"))
(the default forordered
is "contr.poly")
If you have an unordered ("regular") factor and the levels are not in the order you want, you can reset the level order by specifying the levels explicitly, e.g.
mutate(across(age, factor,
levels = c("0-10 years", "11-20 years", "21-30 years", "30-40 years")))
R sets the factors in alphabetical order by default, which is sometimes not what you want (but I can't think of a case where the order would be 'random' ...)
How does R determine the default level ordering of a factor variable when importing data?
The default ordering is alphabetic
set.seed(24)
v1 <- factor(sample(letters[1:10], 50, replace = TRUE))
levels(v1)
Reorder factor levels by pattern
You can create your desired factor levels programmatically.
lvls <- do.call(paste, c(tidyr::expand_grid(
c('Female', 'Male'), c('18_34', '35_49'), c('HS', 'CG')), sep = '-'))
lvls
#[1] "Female-18_34-HS" "Female-18_34-CG" "Female-35_49-HS" "Female-35_49-CG"
#[5] "Male-18_34-HS" "Male-18_34-CG" "Male-35_49-HS" "Male-35_49-CG"
You can use this lvls
as levels in the factor
call.
How do I make predictions using an ordered factor coefficient in R?
Just give R a data frame with x
values drawn from the levels of the factor ("none", "some", etc.), and it will do the rest.
I changed your setup slightly to change the type of x
to ordered()
within the data frame (this will carry through all of the computations).
d$x = ordered(d$x, labels=c("none", "some", "more", "a lot"))
m1 <- lm(y~x, d) ## save fitted object
Coefs <- coef(m1)
Now we can predict()
:
predict(m1, newdata = data.frame(x=c("none","more")))
## 1 2
## 2.993959 6.997342
(didn't have to explicitly say that the new x
was ordered()
)
If you want to dig a little bit deeper into the computations, you can look at the model matrix:
model.matrix(~unique(d$x))
For each level of the factor, these are the values R multiplies the coefficients by to generate the prediction (e.g. for level = "none", 1*b0 + (-0.67)*b1 + 0.5*b2 - 0.223*b3
)
(Intercept) unique(d$x).L unique(d$x).Q unique(d$x).C
1 1 -0.6708204 0.5 -0.2236068
2 1 -0.2236068 -0.5 0.6708204
3 1 0.2236068 -0.5 -0.6708204
4 1 0.6708204 0.5 0.2236068
For even more detail, look at ?poly
or the source code of poly()
(although neither of these is easy!)
change the order of a factor character vector
Please, find below one possible solution using the mixedsort()
function from the gtools
library.
Reprex
- Code
library(gtools)
factor(mixedsort(c("AS-GEN-SUM-10-Fall", "AS-GEN-SUM-3-Fall","AS-GEN-SUM-4-Fall","AS-GEN-SUM-5-Fall"), decreasing = TRUE))
- Output
#> [1] AS-GEN-SUM-3-Fall AS-GEN-SUM-4-Fall AS-GEN-SUM-5-Fall AS-GEN-SUM-10-Fall
#> 4 Levels: AS-GEN-SUM-10-Fall AS-GEN-SUM-3-Fall ... AS-GEN-SUM-5-Fall
Created on 2022-02-11 by the reprex package (v2.0.1)
Related Topics
How to Group by Two Columns in R
How to Create a Bipartite Network in R with Igraph or Tnet
Round a Posix Date (Posixct) with Base R Functionality
How to Generalize Outer to N Dimensions
Reshape Multiple Categorical Variables to Binary Response Variables
Calculating Time Difference Between Two Columns
R: Eval(Parse(...)) Is Often Suboptimal
Why am I Losing Categorical Data in My Regression Summary
How to Change the Number of Decimal Places on Axis Labels in Ggplot2
Photo Alignment with Graph in R
Possible to Create Rd Help Files for Objects Not in a Package