Create category based on range in R
Why didn't cut
work? Did you not assign to a new column or something?
> data=data.frame(x=c(3,4,6,12))
> data$group = cut(data$x,c(0,5,10,15))
> data
x group
1 3 (0,5]
2 4 (0,5]
3 6 (5,10]
4 12 (10,15]
What you've created there is a factor
object in a column of your data frame. The text displayed is the levels
of the factor, and you can change them by assignment:
levels(data$group) = c("0-5","6-10",">10")
data
x group
1 3 0-5
2 4 0-5
3 6 6-10
4 12 >10
Read some basic R docs on factors and you'll get it.
Create categorical variable in R based on range
Ian's answer (cut) is the most common way to do this, as far as i know.
I prefer to use shingle, from the Lattice Package
the argument that specifies the binning intervals seems a little more intuitive to me.
you use shingle like so:
# mock some data
data = sample(0:40, 200, replace=T)
a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)
my_bins = matrix(rbind(a, b, c, d, e), ncol=2)
# returns: (the binning intervals i've set)
[,1] [,2]
[1,] 0 5
[2,] 5 9
[3,] 9 19
[4,] 19 33
[5,] 33 41
shx = shingle(data, intervals=my_bins)
#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
min max count
1 0 5 23
2 5 9 17
3 9 19 56
4 19 33 76
5 33 41 46
Create categorical variable based on time or index range/interval in R
Try
paste0("ID", ceiling(1:nrow(df) / 1440))
or
paste0("ID", (1:nrow(df)-1) %/% 1440 + 1)
Creating a new column of categorical variables based on date range
You have few syntax issues in your ifelse
statement.
Since you are using dplyr
you can simplify this with case_when
and between
functions .
library(dplyr)
dat %>%
mutate(new_var = case_when(
between(date, as.Date("1954-03-13"), as.Date("1958-12-07"))~"test1",
between(date, as.Date("1958-09-14"), as.Date("1964-03-07"))~ "test2")
)
# record_id date new_var
#1 111111 1956-10-28 test1
#2 222222 1956-10-28 test1
#3 333333 1956-10-29 test1
#4 444444 1956-10-29 test1
#5 555555 1956-10-30 test1
Create categorical variable in R based on range
Ian's answer (cut) is the most common way to do this, as far as i know.
I prefer to use shingle, from the Lattice Package
the argument that specifies the binning intervals seems a little more intuitive to me.
you use shingle like so:
# mock some data
data = sample(0:40, 200, replace=T)
a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)
my_bins = matrix(rbind(a, b, c, d, e), ncol=2)
# returns: (the binning intervals i've set)
[,1] [,2]
[1,] 0 5
[2,] 5 9
[3,] 9 19
[4,] 19 33
[5,] 33 41
shx = shingle(data, intervals=my_bins)
#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
min max count
1 0 5 23
2 5 9 17
3 9 19 56
4 19 33 76
5 33 41 46
How to create and insert a column with categorical variables for specific ranges
You can use base::strsplit
.
Here, I split the sample
column at each _
. The fourth element of separated character list is our index. I am using [4]
within lapply
to get the 4th element of each row.
s1$Index <- lapply(strsplit(s1$sample, split = "_"), `[`, 4)
# > sample Index
# > 1 Br_LV_0040324_BC1_1 BC1
# > 2 Br_LV_0040324_BC1_2 BC1
# > 3 Br_LV_0040324_BC1_3 BC1
# > 4 Br_LV_0040324_BC1_4 BC1
# > 5 Br_LV_0040324_LBR_1 LBR
# > 6 Br_LV_0040324_LBR_2 LBR
We can also use regex
:
s1$Index <- sub("(?:[^\\_]*\\_){3}([^_]*)([^.*]*)$", "\\1", s1$sample)
See the Regex Demo.
Data:
s1 <- read.table(text="sample
Br_LV_0040324_BC1_1
Br_LV_0040324_BC1_2
Br_LV_0040324_BC1_3
Br_LV_0040324_BC1_4
Br_LV_0040324_LBR_1
Br_LV_0040324_LBR_2", header = T, stringsAsFactor=F)
Create a categorical variable
using factor
in base R
Data:
# set random seed
set.seed(1L)
# without any NA
x1 <- sample(x = 1:10, size = 20, replace=TRUE)
# with NA
x2 <- sample(x = c(1:10, NA), size = 20, replace=TRUE)
Code:
# without any NA
as.character(factor(x1, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))
# with NA
as.character(factor(x2, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))
How to create a new categorical variable based on the location of first zero in a column in a long format data using R?
You can make use of case_when
to pass multiple conditions for each id
.
library(dplyr)
df %>%
group_by(id) %>%
mutate(first_zero_loc = time[match(0, competency)],
yeel = case_when(all(competency == 1) ~ "unqualfied",
all(competency == 0) ~ "undefined",
between(first_zero_loc, 1, 5) ~ "entry level",
between(first_zero_loc, 6, 8) ~ "intermediate level",
between(first_zero_loc, 9, 12) ~ "competency level",
)) %>%
ungroup %>%
select(-first_zero_loc)
Related Topics
In 'Knitr' How to Test for If the Output Will Be PDF or Word
How to Change the Formatting of Numbers on an Axis with Ggplot
Changing Facet Label to Math Formula in Ggplot2
Count Values Separated by a Comma in a Character String
How to Add Table of Contents in Rmarkdown
Line Break When No Data in Ggplot2
Converting Excel Datetime Serial Number to R Datetime
Split Up a Dataframe by Number of Rows
Format Number as Fixed Width, with Leading Zeros
How to Update R Packages in Default Library on Windows 7
Is There a Better Alternative Than String Manipulation to Programmatically Build Formulas
Data.Table - Select First N Rows Within Group
Center X and Y Axis with Ggplot2
How to Increase the Space Between the Bars in a Bar Plot in Ggplot2