Convert Continuous Numeric Values to Discrete Categories Defined by Intervals

Convert continuous numeric values to discrete categories defined by intervals

If there is a reason you don't want to use cut then I don't understand why. cut will work fine for what you want to do

# Some example data
rota2 <- data.frame(age_mnth = 1:170)
# Your way of doing things to compare against
rota2$age_gr<-ifelse(rota2$age_mnth<6,rr2<-"0-5 mnths",
ifelse(rota2$age_mnth>5&rota2$age_mnth<12,rr2<-"6-11 mnths",
ifelse(rota2$age_mnth>11&rota2$age_mnth<24,rr2<-"12-23 mnths",
ifelse(rota2$age_mnth>23&rota2$age_mnth<60,rr2<-"24-59 mnths",
ifelse(rota2$age_mnth>59&rota2$age_mnth<167,rr2<-"5-14 yrs",
rr2<-"adult")))))

# Using cut
rota2$age_grcut <- cut(rota2$age_mnth,
breaks = c(-Inf, 6, 12, 24, 60, 167, Inf),
labels = c("0-5 mnths", "6-11 mnths", "12-23 mnths", "24-59 mnths", "5-14 yrs", "adult"),
right = FALSE)

Convert continuous numeric values to discrete categories defined by intervals. code run without error but all categories not created

1.Create reproducible example data

  Raw_data <- data.frame(Days =  -362:1081)

2.Solution using dplyr and tidyr:

library(dplyr)
library(tidyverse)
Raw_data_with_groups <- Raw_data %>%
mutate(Days_bin = cut(Days,
breaks = c(min(Days), 0, 5, 30, 60, max(Days)),
labels = c("<=0", "0-5", "5-30", "30-60", ">60")))

Edit:

Or, if you really want to use the ifelse construct: Replace && with & or even better, leave it out alltogether:

    Raw_data <- data.frame(Days=c(-1,0,3,20,31,61))
df <- mutate(Raw_data,Days_Bin = ifelse(Raw_data$Days <= 0,"0 or early",
ifelse(Raw_data$Days <= 5,"<=5",
ifelse(Raw_data$Days <= 30 ,"<=30",
ifelse(Raw_data$Days <= 60, "<=60", ">60")))))

Returns:

  Days   Days_Bin
1 -1 0 or early
2 0 0 or early
3 3 <=5
4 20 <=30
5 31 <=60
6 61 >60

Convert continuous numbers into fixed intervals in R

You could iterate over each value and check if the absolute difference from any of the the interval values c(0, 5, 10, 15) is less than 2.5:

ivl <- c(0, 5, 10, 15)
sapply(x, function(y) ifelse(y > 17.5, 0, ivl[abs(y - ivl) < 2.5]))

Another option is to use comparators. You can use base R's ifelse for this, but dplyr::case_when is a little cleaner:

dplyr::case_when(x < 2.5  | x > 17.5 ~ 0,
x > 2.5 & x < 7.5 ~ 5,
x > 7.5 & x < 12.5 ~ 10,
x > 12.5 & x < 17.5 ~ 15)

In both cases you end up with the following vector:

  [1]  0  0 10 15 15  0 10  0  5 10  5  5  5  0  0 10 15 10 15  0 10  5  5 10  5 15  0  0 15  0
[31] 15 0 10 10 0 15 5 5 5 10 10 0 0 15 5 5 5 5 10 10 15 10 10 15 5 0 10 0 5 0
[61] 0 10 10 0 10 0 5 0 15 5 15 15 10 5 15 0 5 5 0 5 5 0 15 5 5 10 5 15 0 10
[91] 10 0 15 15 0 15 0 10 5 5

Converting continuous variable into discrete values (alphanumeric) in R . The Ranges are alpha numeric

You can use the case_when function from the dplyr package. df2 is the final output.

library(dplyr)

df2 <- df %>%
mutate(salary = case_when(
salary < 10000 ~ "<10K",
salary >= 10000 & salary < 20000 ~ "10K-20K",
salary >= 20000 & salary < 30000 ~ "20K-30K",
salary >= 30000 ~ ">30K",
TRUE ~ "NA"
))

How to convert continuous values into discrete values by equivalent partitioning in pandas

You can use pd.cut with parameter right = False as:

pd.cut(df.a, bins=3, labels=np.arange(3), right=False)

0 0
1 0
2 0
3 1
4 1
5 2
Name: a, dtype: category
Categories (3, int64): [0 < 1 < 2]

How the binning is done:

pd.cut(df.a, bins=3, right=False)

0 [1.1, 2.1)
1 [1.1, 2.1)
2 [1.1, 2.1)
3 [2.1, 3.1)
4 [2.1, 3.1)
5 [3.1, 4.103)
Name: a, dtype: category
Categories (3, interval[float64]): [[1.1, 2.1) < [2.1, 3.1) < [3.1, 4.103)]

How to add a column to a dataframe in R?

Here is another option using mutate and cut:

library(dplyr)    
df_TD %>%
dplyr::mutate(dosiscatg = cut(dose, breaks = c(0, 0.5, 1.0,2.0), labels = c("D0.5", "D1", "D2")))
    len supp dose dosiscatg
1 4.2 VC 0.5 D0.5
2 11.5 VC 0.5 D0.5
3 7.3 VC 0.5 D0.5
4 5.8 VC 0.5 D0.5
5 6.4 VC 0.5 D0.5
6 10.0 VC 0.5 D0.5
7 11.2 VC 0.5 D0.5
8 11.2 VC 0.5 D0.5
9 5.2 VC 0.5 D0.5
10 7.0 VC 0.5 D0.5
11 16.5 VC 1.0 D1
12 16.5 VC 1.0 D1
13 15.2 VC 1.0 D1
14 17.3 VC 1.0 D1
15 22.5 VC 1.0 D1
16 17.3 VC 1.0 D1
17 13.6 VC 1.0 D1
18 14.5 VC 1.0 D1
19 18.8 VC 1.0 D1
20 15.5 VC 1.0 D1
21 23.6 VC 2.0 D2
22 18.5 VC 2.0 D2
23 33.9 VC 2.0 D2
24 25.5 VC 2.0 D2
25 26.4 VC 2.0 D2
26 32.5 VC 2.0 D2
27 26.7 VC 2.0 D2
28 21.5 VC 2.0 D2
29 23.3 VC 2.0 D2
30 29.5 VC 2.0 D2
31 15.2 OJ 0.5 D0.5
32 21.5 OJ 0.5 D0.5
33 17.6 OJ 0.5 D0.5
34 9.7 OJ 0.5 D0.5
35 14.5 OJ 0.5 D0.5
36 10.0 OJ 0.5 D0.5
37 8.2 OJ 0.5 D0.5
38 9.4 OJ 0.5 D0.5
39 16.5 OJ 0.5 D0.5
40 9.7 OJ 0.5 D0.5
41 19.7 OJ 1.0 D1
42 23.3 OJ 1.0 D1
43 23.6 OJ 1.0 D1
44 26.4 OJ 1.0 D1
45 20.0 OJ 1.0 D1
46 25.2 OJ 1.0 D1
47 25.8 OJ 1.0 D1
48 21.2 OJ 1.0 D1
49 14.5 OJ 1.0 D1
50 27.3 OJ 1.0 D1
51 25.5 OJ 2.0 D2
52 26.4 OJ 2.0 D2
53 22.4 OJ 2.0 D2
54 24.5 OJ 2.0 D2
55 24.8 OJ 2.0 D2
56 30.9 OJ 2.0 D2
57 26.4 OJ 2.0 D2
58 27.3 OJ 2.0 D2
59 29.4 OJ 2.0 D2
60 23.0 OJ 2.0 D2

Find categorical indicator vector based on continuous thresholds

An easy option is findInterval

categories2 <- findInterval(y, t)
all.equal(categories, categories2)
#[1] TRUE


Related Topics



Leave a reply



Submit