R How to convert a numeric into factor with predefined labels
You can do something like this:
labs = letters[3:7]
vec = rep(1:5,2)
factorVec <- factor(x=vec, levels=sort(unique(vec)), labels = c( "c", "d", "e", "f", "g"))
I have sorted the unique(vec)
, so as to make results consistent. unique()
will return unique values based on the first occurrence of the element. By specifying the order, the code becomes more robust.
Also by specifying the levels and labels both, I think that code will become more readable.
EDIT
If you look in the documentation using ?factor
, you will find :
levels
an optional vector of the values (as character strings) that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x. Note that this set can be specified as smaller than sort(unique(x))
So you can note that there is some sorting inside the factor faction itself. But it is my opinion that one should add the levels information, so as to make code more readable.
Convert numeric columns to factors with different labels using key
If you can get the labels and levels appropriately associated with the e
column, which is your link to the columns of the dataset, you can do this via purrr:pmap_df
.
Here's how that would look. Most of the work is in getting the labels and levels as a list column, which I do via tibble
(loaded with dplyr).
Starting with your second dataset, y
, which is an important part of this.
e = c(1,2,2,3)
f = names(x)
y = data.frame(e,f)
e f
1 1 a
2 2 b
3 2 c
4 3 d
Make sure the levels and labels are available and can be associated with your e
vector. If they are in a long format, you could get them into a list-column format via tidyr::nest
. I found this to be the most time-consuming step in terms of getting this info written out.
library(dplyr)
levels.labels = tibble(e = c(1, 2, 3),
levels = list(1:5, 0:1, 1:5),
labels = list(c("Less than 25%",
"25-50%",
"51-75%",
"76-90%",
"More than 90%"),
c("Yes","No"),
c("l","m","n","o","p")))
If you needed to write your levels and labels out within R you might want to try tribble
, which is available in the development version of the tibble package.
library(tibble)
levels.labels = tribble(~e, ~levels, ~labels,
1, 1:5, c("Less than 25%",
"25-50%",
"51-75%",
"76-90%",
"More than 90%"),
2, 0:1, c("Yes","No"),
3, 1:5, c("l","m","n","o","p"))
Merge the levels and labels with your y
dataset based on e
. The rows of the result is a 1 to 1 match of the columns of x
.
key = left_join(y, levels.labels)
e f levels labels
1 1 a 1, 2, 3, 4, 5 Less than 25%, 25-50%, 51-75%, 76-90%, More than 90%
2 2 b 0, 1 Yes, No
3 2 c 0, 1 Yes, No
4 3 d 1, 2, 3, 4, 5 l, m, n, o, p
To factor each column, put the x dataset, the levels, and the labels all into a named list. The names of each element correspond to the names of the arguments you need to use from factor
. This allows you to easily use pmap_df
from purrr to factor
each column of x
, using the known levels and labels information.
library(purrr)
pmap_df(list(x = x, levels = key$levels, labels = key$labels), factor)
# A tibble: 5 x 4
a b c d
<fctr> <fctr> <fctr> <fctr>
1 Less than 25% Yes No m
2 25-50% No Yes n
3 51-75% Yes No o
4 76-90% No Yes p
5 More than 90% Yes No n
In pmap
functions, the elements within the list must be all the same size. In this case, the first element has 4 columns and the second two are vectors with length 4.
r, stuck on converting numeric to factor with labels
Almost had it, its labels not levels
factor(
rep(c(1:5),each=3,times=4),
levels = 1:5,
labels = c('yes','no','sometimes','almost never', 'almost always'),
ordered = TRUE
)
Edit: as Roland pointed out, including both levels and labels is safer.
change a numeric column to a factor and assign labels/levels to the data
You may try using levels
function. For example dummy
data with three factor 1, 2 and 3,
dummy <- data.frame(
fac = rep(c(1,2,3),4)
)
dummy$fac <- as.factor(dummy$fac)
In base R
R
-1
levels(dummy$fac) <- c("Petrol", "Hybrid", "Disesel")
R
-2
levels(dummy$fac) <- list("Petrol" = 1, "Hybrid" = 2, "Disesel" = 3)
Also, using dplyr
package,
dplyr
dummy$fac <- dplyr::recode_factor(dummy$fac, "1" = "Petrol", "2" = "Hybrid" , "3" = "Disesel")
All will give
fac
1 Petrol
2 Hybrid
3 Disesel
4 Petrol
5 Hybrid
6 Disesel
7 Petrol
8 Hybrid
9 Disesel
10 Petrol
11 Hybrid
12 Disesel
And str(dummy$fac)
is like
Factor w/ 3 levels "Petrol","Hybrid",..: 1 2 3 1 2 3 1 2 3 1 ...
Convert Number to Factor using Labels in R
The error message is providing some direction. The problem is that the labels vector is of length 4 but your levels are length 101. I think you are almost there with the original code. Just make the labels to the correct length with:
reps<-rep("Non-NYPD Jurisdiction",98)
NYC_2019_Arrests$JURISDICTION_CODE <- factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0,1,2, 3:100), labels = c("Patrol", "Transit", "Housing", reps))
Edit with explanation:
Run this code for additional explanation.
#The key is that labels needs the same vector length as level
#length of levels
levels <- c(0,1,2, 3:100)
print(length(levels))
#length of original levels
labels = c("Patrol", "Transit", "Housing", "Non-NYPD Jurisdiction")
print(length(labels))
#This is problematic because what happens for when level - 4. labels[4] would be null.
#Therefore need to repeat "Non-NYPD Jurisdiction" for each level
#since length(3:100) is 98 that is how we know we need 98
reps<-rep("Non-NYPD Jurisdiction",98)
labels <- c("Patrol", "Transit", "Housing", reps)
print(length(labels))
Using Apply to convert numeric columns to factors with labels
When we are using lapply
without an anonymous function call, the arguments of the new function can be passed as such.
myscale[] <- lapply(myscale, factor, levels = 0:5,
labels = c("Unanswered", "1", "2", "3", "4", "5"))
If we use anonymous function call,
myscale[] <- lapply(myscale, function(x) factor(x, levels = 0:5,
labels =c("Unanswered", "1", "2", "3", "4", "5")))
The above can be also done with mutate_each
from dplyr
library(dplyr)
library(magrittr)
myscale %<>%
mutate_each(funs(factor(., levels = 0:5, labels = c("Unanswered", 1:5))))
data
set.seed(24)
myscale <- data.frame(scale_1 = sample(0:5, 20, replace=TRUE),
scale_2 = sample(0:5, 20, replace=TRUE))
Converting factor variable to numeric, and from numeric back to factor
Before coercing the factors to numeric, create a lookup table of numeric-factor label pairs. At the end of your workflow, merge the factor labels back into your data.
library(dplyr)
data(warpbreaks)
original <- warpbreaks
value_label_map <- warpbreaks %>%
select(wool, tension) %>%
mutate(wool_num = as.numeric(wool), tension_num = as.numeric(tension)) %>%
distinct()
warpbreaks <- warpbreaks %>%
mutate(wool = as.numeric(wool), tension = as.numeric(tension))
warpbreaks <- left_join(warpbreaks, value_label_map,
by = c("wool" = "wool_num", "tension" = "tension_num"))
identical(original$wool, warpbreaks$wool.y)
identical(original$tension, warpbreaks$tension.y)
Changing 1 row in a dataframe from numeric to factor
You can assign labels
to factor
df$new_col <- factor(df$col, labels = c('gen', 'nongen'))
Or using ifelse
:
df$new_col <- factor(ifelse(df$col == 0, 'gen', 'nongen'))
Related Topics
Get the Path of Current Script
Can't Load X11 in R After Os X Yosemite Upgrade
How Can Put Multiple Plots Side-By-Side in Shiny R
Convert from Lowercase to Uppercase All Values in All Character Variables in Dataframe
Population Pyramid Density Plot in R
Fastest Way to Read in 100,000 .Dat.Gz Files
Run a Custom Function on a Data Frame in R, by Group
Using Expression(Paste( to Insert Math Notation into a Legend
Hide Certain Columns in a Responsive Data Table Using Dt Package
Geom_Col Is Assigning the Wrong Independent Variable
Here We Go Again: Append an Element to a List in R
Remove Empty Elements from List with Character(0)
How to Transpose a Dataframe in Tidyverse
Passing List of Named Parameters to Function
Highlight (Shade) Plot Background in Specific Time Range
Matching Multiple Columns on Different Data Frames and Getting Other Column as Result