Create Categorical Variable in R Based on Range

Create category based on range in R

Why didn't cut work? Did you not assign to a new column or something?

> data=data.frame(x=c(3,4,6,12))
> data$group = cut(data$x,c(0,5,10,15))
> data
x group
1 3 (0,5]
2 4 (0,5]
3 6 (5,10]
4 12 (10,15]

What you've created there is a factor object in a column of your data frame. The text displayed is the levels of the factor, and you can change them by assignment:

levels(data$group) = c("0-5","6-10",">10")
data
x group
1 3 0-5
2 4 0-5
3 6 6-10
4 12 >10

Read some basic R docs on factors and you'll get it.

Create categorical variable in R based on range

Ian's answer (cut) is the most common way to do this, as far as i know.

I prefer to use shingle, from the Lattice Package

the argument that specifies the binning intervals seems a little more intuitive to me.

you use shingle like so:

# mock some data
data = sample(0:40, 200, replace=T)

a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)

my_bins = matrix(rbind(a, b, c, d, e), ncol=2)

# returns: (the binning intervals i've set)
[,1] [,2]
[1,] 0 5
[2,] 5 9
[3,] 9 19
[4,] 19 33
[5,] 33 41

shx = shingle(data, intervals=my_bins)

#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
min max count
1 0 5 23
2 5 9 17
3 9 19 56
4 19 33 76
5 33 41 46

Create categorical variable based on time or index range/interval in R

Try

paste0("ID", ceiling(1:nrow(df) / 1440))

or

paste0("ID", (1:nrow(df)-1) %/% 1440 + 1)

Creating a new column of categorical variables based on date range

You have few syntax issues in your ifelse statement.

Since you are using dplyr you can simplify this with case_when and between functions .

library(dplyr)

dat %>%
mutate(new_var = case_when(
between(date, as.Date("1954-03-13"), as.Date("1958-12-07"))~"test1",
between(date, as.Date("1958-09-14"), as.Date("1964-03-07"))~ "test2")
)

# record_id date new_var
#1 111111 1956-10-28 test1
#2 222222 1956-10-28 test1
#3 333333 1956-10-29 test1
#4 444444 1956-10-29 test1
#5 555555 1956-10-30 test1

Create categorical variable in R based on range

Ian's answer (cut) is the most common way to do this, as far as i know.

I prefer to use shingle, from the Lattice Package

the argument that specifies the binning intervals seems a little more intuitive to me.

you use shingle like so:

# mock some data
data = sample(0:40, 200, replace=T)

a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)

my_bins = matrix(rbind(a, b, c, d, e), ncol=2)

# returns: (the binning intervals i've set)
[,1] [,2]
[1,] 0 5
[2,] 5 9
[3,] 9 19
[4,] 19 33
[5,] 33 41

shx = shingle(data, intervals=my_bins)

#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
min max count
1 0 5 23
2 5 9 17
3 9 19 56
4 19 33 76
5 33 41 46

How to create and insert a column with categorical variables for specific ranges

You can use base::strsplit.

Here, I split the sample column at each _. The fourth element of separated character list is our index. I am using [4] within lapply to get the 4th element of each row.

s1$Index <- lapply(strsplit(s1$sample, split = "_"), `[`, 4)

# > sample Index
# > 1 Br_LV_0040324_BC1_1 BC1
# > 2 Br_LV_0040324_BC1_2 BC1
# > 3 Br_LV_0040324_BC1_3 BC1
# > 4 Br_LV_0040324_BC1_4 BC1
# > 5 Br_LV_0040324_LBR_1 LBR
# > 6 Br_LV_0040324_LBR_2 LBR

We can also use regex:

s1$Index <- sub("(?:[^\\_]*\\_){3}([^_]*)([^.*]*)$", "\\1", s1$sample)

See the Regex Demo.

Data:

s1 <- read.table(text="sample
Br_LV_0040324_BC1_1
Br_LV_0040324_BC1_2
Br_LV_0040324_BC1_3
Br_LV_0040324_BC1_4
Br_LV_0040324_LBR_1
Br_LV_0040324_LBR_2", header = T, stringsAsFactor=F)

Create a categorical variable

using factor in base R

Data:

# set random seed
set.seed(1L)
# without any NA
x1 <- sample(x = 1:10, size = 20, replace=TRUE)
# with NA
x2 <- sample(x = c(1:10, NA), size = 20, replace=TRUE)

Code:

# without any NA
as.character(factor(x1, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))

# with NA
as.character(factor(x2, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))

How to create a new categorical variable based on the location of first zero in a column in a long format data using R?

You can make use of case_when to pass multiple conditions for each id.

library(dplyr)

df %>%
group_by(id) %>%
mutate(first_zero_loc = time[match(0, competency)],
yeel = case_when(all(competency == 1) ~ "unqualfied",
all(competency == 0) ~ "undefined",
between(first_zero_loc, 1, 5) ~ "entry level",
between(first_zero_loc, 6, 8) ~ "intermediate level",
between(first_zero_loc, 9, 12) ~ "competency level",
)) %>%
ungroup %>%
select(-first_zero_loc)


Related Topics



Leave a reply



Submit