R How to Convert a Numeric into Factor with Predefined Labels

R How to convert a numeric into factor with predefined labels

You can do something like this:

labs = letters[3:7]
vec = rep(1:5,2)
factorVec <- factor(x=vec, levels=sort(unique(vec)), labels = c( "c", "d", "e", "f", "g"))

I have sorted the unique(vec), so as to make results consistent. unique() will return unique values based on the first occurrence of the element. By specifying the order, the code becomes more robust.

Also by specifying the levels and labels both, I think that code will become more readable.

EDIT

If you look in the documentation using ?factor, you will find :

levels
an optional vector of the values (as character strings) that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x. Note that this set can be specified as smaller than sort(unique(x))

So you can note that there is some sorting inside the factor faction itself. But it is my opinion that one should add the levels information, so as to make code more readable.

Convert numeric columns to factors with different labels using key

If you can get the labels and levels appropriately associated with the e column, which is your link to the columns of the dataset, you can do this via purrr:pmap_df.

Here's how that would look. Most of the work is in getting the labels and levels as a list column, which I do via tibble (loaded with dplyr).

Starting with your second dataset, y, which is an important part of this.

e = c(1,2,2,3)
f = names(x)

y = data.frame(e,f)

e f
1 1 a
2 2 b
3 2 c
4 3 d

Make sure the levels and labels are available and can be associated with your e vector. If they are in a long format, you could get them into a list-column format via tidyr::nest. I found this to be the most time-consuming step in terms of getting this info written out.

library(dplyr)

levels.labels = tibble(e = c(1, 2, 3),
levels = list(1:5, 0:1, 1:5),
labels = list(c("Less than 25%",
"25-50%",
"51-75%",
"76-90%",
"More than 90%"),
c("Yes","No"),
c("l","m","n","o","p")))

If you needed to write your levels and labels out within R you might want to try tribble, which is available in the development version of the tibble package.

library(tibble)
levels.labels = tribble(~e, ~levels, ~labels,
1, 1:5, c("Less than 25%",
"25-50%",
"51-75%",
"76-90%",
"More than 90%"),
2, 0:1, c("Yes","No"),
3, 1:5, c("l","m","n","o","p"))

Merge the levels and labels with your y dataset based on e. The rows of the result is a 1 to 1 match of the columns of x.

key = left_join(y, levels.labels)

e f levels labels
1 1 a 1, 2, 3, 4, 5 Less than 25%, 25-50%, 51-75%, 76-90%, More than 90%
2 2 b 0, 1 Yes, No
3 2 c 0, 1 Yes, No
4 3 d 1, 2, 3, 4, 5 l, m, n, o, p

To factor each column, put the x dataset, the levels, and the labels all into a named list. The names of each element correspond to the names of the arguments you need to use from factor. This allows you to easily use pmap_df from purrr to factor each column of x, using the known levels and labels information.

library(purrr)
pmap_df(list(x = x, levels = key$levels, labels = key$labels), factor)

# A tibble: 5 x 4
a b c d
<fctr> <fctr> <fctr> <fctr>
1 Less than 25% Yes No m
2 25-50% No Yes n
3 51-75% Yes No o
4 76-90% No Yes p
5 More than 90% Yes No n

In pmap functions, the elements within the list must be all the same size. In this case, the first element has 4 columns and the second two are vectors with length 4.

r, stuck on converting numeric to factor with labels

Almost had it, its labels not levels

factor(
rep(c(1:5),each=3,times=4),
levels = 1:5,
labels = c('yes','no','sometimes','almost never', 'almost always'),
ordered = TRUE
)

Edit: as Roland pointed out, including both levels and labels is safer.

change a numeric column to a factor and assign labels/levels to the data

You may try using levels function. For example dummy data with three factor 1, 2 and 3,

dummy <- data.frame(
fac = rep(c(1,2,3),4)
)
dummy$fac <- as.factor(dummy$fac)

In base R

R-1

levels(dummy$fac) <- c("Petrol", "Hybrid", "Disesel")

R-2

levels(dummy$fac) <- list("Petrol" = 1, "Hybrid" = 2, "Disesel" = 3)

Also, using dplyr package,

dplyr

dummy$fac <- dplyr::recode_factor(dummy$fac, "1" = "Petrol", "2" = "Hybrid" , "3" = "Disesel")

All will give

       fac
1 Petrol
2 Hybrid
3 Disesel
4 Petrol
5 Hybrid
6 Disesel
7 Petrol
8 Hybrid
9 Disesel
10 Petrol
11 Hybrid
12 Disesel

And str(dummy$fac) is like

Factor w/ 3 levels "Petrol","Hybrid",..: 1 2 3 1 2 3 1 2 3 1 ...

Convert Number to Factor using Labels in R

The error message is providing some direction. The problem is that the labels vector is of length 4 but your levels are length 101. I think you are almost there with the original code. Just make the labels to the correct length with:

reps<-rep("Non-NYPD Jurisdiction",98)
NYC_2019_Arrests$JURISDICTION_CODE <- factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0,1,2, 3:100), labels = c("Patrol", "Transit", "Housing", reps))

Edit with explanation:

Run this code for additional explanation.

#The key is that labels needs the same vector length as level

#length of levels
levels <- c(0,1,2, 3:100)
print(length(levels))
#length of original levels
labels = c("Patrol", "Transit", "Housing", "Non-NYPD Jurisdiction")
print(length(labels))
#This is problematic because what happens for when level - 4. labels[4] would be null.
#Therefore need to repeat "Non-NYPD Jurisdiction" for each level
#since length(3:100) is 98 that is how we know we need 98
reps<-rep("Non-NYPD Jurisdiction",98)
labels <- c("Patrol", "Transit", "Housing", reps)
print(length(labels))

Using Apply to convert numeric columns to factors with labels

When we are using lapply without an anonymous function call, the arguments of the new function can be passed as such.

myscale[] <- lapply(myscale, factor, levels = 0:5, 
labels = c("Unanswered", "1", "2", "3", "4", "5"))

If we use anonymous function call,

myscale[] <- lapply(myscale, function(x) factor(x, levels = 0:5,
labels =c("Unanswered", "1", "2", "3", "4", "5")))

The above can be also done with mutate_each from dplyr

library(dplyr)
library(magrittr)
myscale %<>%
mutate_each(funs(factor(., levels = 0:5, labels = c("Unanswered", 1:5))))

data

set.seed(24)
myscale <- data.frame(scale_1 = sample(0:5, 20, replace=TRUE),
scale_2 = sample(0:5, 20, replace=TRUE))

Converting factor variable to numeric, and from numeric back to factor

Before coercing the factors to numeric, create a lookup table of numeric-factor label pairs. At the end of your workflow, merge the factor labels back into your data.

library(dplyr)
data(warpbreaks)
original <- warpbreaks

value_label_map <- warpbreaks %>%
select(wool, tension) %>%
mutate(wool_num = as.numeric(wool), tension_num = as.numeric(tension)) %>%
distinct()

warpbreaks <- warpbreaks %>%
mutate(wool = as.numeric(wool), tension = as.numeric(tension))

warpbreaks <- left_join(warpbreaks, value_label_map,
by = c("wool" = "wool_num", "tension" = "tension_num"))

identical(original$wool, warpbreaks$wool.y)
identical(original$tension, warpbreaks$tension.y)

Changing 1 row in a dataframe from numeric to factor

You can assign labels to factor

df$new_col <- factor(df$col, labels = c('gen', 'nongen'))

Or using ifelse :

df$new_col <- factor(ifelse(df$col == 0, 'gen', 'nongen'))


Related Topics



Leave a reply



Submit