Create Categorical Variable in R Based on Range

Create category based on range in R

Why didn't cut work? Did you not assign to a new column or something?

> data=data.frame(x=c(3,4,6,12))
> data$group = cut(data$x,c(0,5,10,15))
> data
   x   group
1  3   (0,5]
2  4   (0,5]
3  6  (5,10]
4 12 (10,15]

What you've created there is a factor object in a column of your data frame. The text displayed is the levels of the factor, and you can change them by assignment:

levels(data$group) = c("0-5","6-10",">10")
data
   x group
1  3   0-5
2  4   0-5
3  6  6-10
4 12   >10

Read some basic R docs on factors and you'll get it.

Create categorical variable in R based on range

Ian's answer (cut) is the most common way to do this, as far as i know.

I prefer to use shingle, from the Lattice Package

the argument that specifies the binning intervals seems a little more intuitive to me.

you use shingle like so:

# mock some data
data = sample(0:40, 200, replace=T)

a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)

my_bins = matrix(rbind(a, b, c, d, e), ncol=2)

# returns: (the binning intervals i've set)
        [,1] [,2]
 [1,]    0    5
 [2,]    5    9
 [3,]    9   19
 [4,]   19   33
 [5,]   33   41

shx = shingle(data, intervals=my_bins)

#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
   min max count
1   0   5    23
2   5   9    17
3   9  19    56
4  19  33    76
5  33  41    46

Create categorical variable based on time or index range/interval in R

Try

paste0("ID", ceiling(1:nrow(df) / 1440))

paste0("ID", (1:nrow(df)-1) %/% 1440 + 1)

Creating a new column of categorical variables based on date range

You have few syntax issues in your ifelse statement.

Since you are using dplyr you can simplify this with case_when and between functions .

library(dplyr)

dat %>%
  mutate(new_var = case_when(
           between(date, as.Date("1954-03-13"), as.Date("1958-12-07"))~"test1",
           between(date, as.Date("1958-09-14"), as.Date("1964-03-07"))~ "test2")
         )

#  record_id       date new_var
#1    111111 1956-10-28   test1
#2    222222 1956-10-28   test1
#3    333333 1956-10-29   test1
#4    444444 1956-10-29   test1
#5    555555 1956-10-30   test1

Create categorical variable in R based on range

Ian's answer (cut) is the most common way to do this, as far as i know.

I prefer to use shingle, from the Lattice Package

the argument that specifies the binning intervals seems a little more intuitive to me.

you use shingle like so:

# mock some data
data = sample(0:40, 200, replace=T)

a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)

my_bins = matrix(rbind(a, b, c, d, e), ncol=2)

# returns: (the binning intervals i've set)
        [,1] [,2]
 [1,]    0    5
 [2,]    5    9
 [3,]    9   19
 [4,]   19   33
 [5,]   33   41

shx = shingle(data, intervals=my_bins)

#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
   min max count
1   0   5    23
2   5   9    17
3   9  19    56
4  19  33    76
5  33  41    46

How to create and insert a column with categorical variables for specific ranges

You can use base::strsplit.

Here, I split the sample column at each _. The fourth element of separated character list is our index. I am using [4] within lapply to get the 4th element of each row.

s1$Index <- lapply(strsplit(s1$sample, split = "_"), `[`, 4)

# >                sample Index
# > 1 Br_LV_0040324_BC1_1   BC1
# > 2 Br_LV_0040324_BC1_2   BC1
# > 3 Br_LV_0040324_BC1_3   BC1
# > 4 Br_LV_0040324_BC1_4   BC1
# > 5 Br_LV_0040324_LBR_1   LBR
# > 6 Br_LV_0040324_LBR_2   LBR

We can also use regex:

s1$Index <- sub("(?:[^\\_]*\\_){3}([^_]*)([^.*]*)$", "\\1", s1$sample)

See the Regex Demo.

Data:

s1 <- read.table(text="sample
Br_LV_0040324_BC1_1
Br_LV_0040324_BC1_2
Br_LV_0040324_BC1_3
Br_LV_0040324_BC1_4
Br_LV_0040324_LBR_1
Br_LV_0040324_LBR_2", header = T, stringsAsFactor=F)

Create a categorical variable

using factor in base R

Data:

# set random seed
set.seed(1L)
# without any NA
x1 <- sample(x = 1:10, size = 20, replace=TRUE)
# with NA
x2 <- sample(x = c(1:10, NA), size = 20, replace=TRUE)

Code:

# without any NA
as.character(factor(x1, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))

# with NA    
as.character(factor(x2, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))

How to create a new categorical variable based on the location of first zero in a column in a long format data using R?

You can make use of case_when to pass multiple conditions for each id.

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(first_zero_loc = time[match(0, competency)], 
          yeel = case_when(all(competency == 1) ~ "unqualfied", 
                           all(competency == 0) ~ "undefined",
                           between(first_zero_loc, 1, 5) ~ "entry level", 
                           between(first_zero_loc, 6, 8) ~ "intermediate level",
                           between(first_zero_loc, 9, 12) ~ "competency level",
                          ))  %>%
  ungroup %>%
  select(-first_zero_loc)

Create Categorical Variable in R Based on Range

Create category based on range in R

Create categorical variable in R based on range

Create categorical variable based on time or index range/interval in R

Creating a new column of categorical variables based on date range

Create categorical variable in R based on range

How to create and insert a column with categorical variables for specific ranges

Create a categorical variable

How to create a new categorical variable based on the location of first zero in a column in a long format data using R?

Related Topics

Leave a reply