Convert continuous numeric values to discrete categories defined by intervals
If there is a reason you don't want to use cut
then I don't understand why. cut
will work fine for what you want to do
# Some example data
rota2 <- data.frame(age_mnth = 1:170)
# Your way of doing things to compare against
rota2$age_gr<-ifelse(rota2$age_mnth<6,rr2<-"0-5 mnths",
ifelse(rota2$age_mnth>5&rota2$age_mnth<12,rr2<-"6-11 mnths",
ifelse(rota2$age_mnth>11&rota2$age_mnth<24,rr2<-"12-23 mnths",
ifelse(rota2$age_mnth>23&rota2$age_mnth<60,rr2<-"24-59 mnths",
ifelse(rota2$age_mnth>59&rota2$age_mnth<167,rr2<-"5-14 yrs",
rr2<-"adult")))))
# Using cut
rota2$age_grcut <- cut(rota2$age_mnth,
breaks = c(-Inf, 6, 12, 24, 60, 167, Inf),
labels = c("0-5 mnths", "6-11 mnths", "12-23 mnths", "24-59 mnths", "5-14 yrs", "adult"),
right = FALSE)
Convert continuous numeric values to discrete categories defined by intervals. code run without error but all categories not created
1.Create reproducible example data
Raw_data <- data.frame(Days = -362:1081)
2.Solution using dplyr
and tidyr
:
library(dplyr)
library(tidyverse)
Raw_data_with_groups <- Raw_data %>%
mutate(Days_bin = cut(Days,
breaks = c(min(Days), 0, 5, 30, 60, max(Days)),
labels = c("<=0", "0-5", "5-30", "30-60", ">60")))
Edit:
Or, if you really want to use the ifelse
construct: Replace &&
with &
or even better, leave it out alltogether:
Raw_data <- data.frame(Days=c(-1,0,3,20,31,61))
df <- mutate(Raw_data,Days_Bin = ifelse(Raw_data$Days <= 0,"0 or early",
ifelse(Raw_data$Days <= 5,"<=5",
ifelse(Raw_data$Days <= 30 ,"<=30",
ifelse(Raw_data$Days <= 60, "<=60", ">60")))))
Returns:
Days Days_Bin
1 -1 0 or early
2 0 0 or early
3 3 <=5
4 20 <=30
5 31 <=60
6 61 >60
Convert continuous numbers into fixed intervals in R
You could iterate over each value and check if the absolute difference from any of the the interval values c(0, 5, 10, 15)
is less than 2.5:
ivl <- c(0, 5, 10, 15)
sapply(x, function(y) ifelse(y > 17.5, 0, ivl[abs(y - ivl) < 2.5]))
Another option is to use comparators. You can use base R's ifelse
for this, but dplyr::case_when
is a little cleaner:
dplyr::case_when(x < 2.5 | x > 17.5 ~ 0,
x > 2.5 & x < 7.5 ~ 5,
x > 7.5 & x < 12.5 ~ 10,
x > 12.5 & x < 17.5 ~ 15)
In both cases you end up with the following vector:
[1] 0 0 10 15 15 0 10 0 5 10 5 5 5 0 0 10 15 10 15 0 10 5 5 10 5 15 0 0 15 0
[31] 15 0 10 10 0 15 5 5 5 10 10 0 0 15 5 5 5 5 10 10 15 10 10 15 5 0 10 0 5 0
[61] 0 10 10 0 10 0 5 0 15 5 15 15 10 5 15 0 5 5 0 5 5 0 15 5 5 10 5 15 0 10
[91] 10 0 15 15 0 15 0 10 5 5
Converting continuous variable into discrete values (alphanumeric) in R . The Ranges are alpha numeric
You can use the case_when
function from the dplyr
package. df2
is the final output.
library(dplyr)
df2 <- df %>%
mutate(salary = case_when(
salary < 10000 ~ "<10K",
salary >= 10000 & salary < 20000 ~ "10K-20K",
salary >= 20000 & salary < 30000 ~ "20K-30K",
salary >= 30000 ~ ">30K",
TRUE ~ "NA"
))
How to convert continuous values into discrete values by equivalent partitioning in pandas
You can use pd.cut
with parameter right = False
as:
pd.cut(df.a, bins=3, labels=np.arange(3), right=False)
0 0
1 0
2 0
3 1
4 1
5 2
Name: a, dtype: category
Categories (3, int64): [0 < 1 < 2]
How the binning is done:
pd.cut(df.a, bins=3, right=False)
0 [1.1, 2.1)
1 [1.1, 2.1)
2 [1.1, 2.1)
3 [2.1, 3.1)
4 [2.1, 3.1)
5 [3.1, 4.103)
Name: a, dtype: category
Categories (3, interval[float64]): [[1.1, 2.1) < [2.1, 3.1) < [3.1, 4.103)]
How to add a column to a dataframe in R?
Here is another option using mutate
and cut
:
library(dplyr)
df_TD %>%
dplyr::mutate(dosiscatg = cut(dose, breaks = c(0, 0.5, 1.0,2.0), labels = c("D0.5", "D1", "D2")))
len supp dose dosiscatg
1 4.2 VC 0.5 D0.5
2 11.5 VC 0.5 D0.5
3 7.3 VC 0.5 D0.5
4 5.8 VC 0.5 D0.5
5 6.4 VC 0.5 D0.5
6 10.0 VC 0.5 D0.5
7 11.2 VC 0.5 D0.5
8 11.2 VC 0.5 D0.5
9 5.2 VC 0.5 D0.5
10 7.0 VC 0.5 D0.5
11 16.5 VC 1.0 D1
12 16.5 VC 1.0 D1
13 15.2 VC 1.0 D1
14 17.3 VC 1.0 D1
15 22.5 VC 1.0 D1
16 17.3 VC 1.0 D1
17 13.6 VC 1.0 D1
18 14.5 VC 1.0 D1
19 18.8 VC 1.0 D1
20 15.5 VC 1.0 D1
21 23.6 VC 2.0 D2
22 18.5 VC 2.0 D2
23 33.9 VC 2.0 D2
24 25.5 VC 2.0 D2
25 26.4 VC 2.0 D2
26 32.5 VC 2.0 D2
27 26.7 VC 2.0 D2
28 21.5 VC 2.0 D2
29 23.3 VC 2.0 D2
30 29.5 VC 2.0 D2
31 15.2 OJ 0.5 D0.5
32 21.5 OJ 0.5 D0.5
33 17.6 OJ 0.5 D0.5
34 9.7 OJ 0.5 D0.5
35 14.5 OJ 0.5 D0.5
36 10.0 OJ 0.5 D0.5
37 8.2 OJ 0.5 D0.5
38 9.4 OJ 0.5 D0.5
39 16.5 OJ 0.5 D0.5
40 9.7 OJ 0.5 D0.5
41 19.7 OJ 1.0 D1
42 23.3 OJ 1.0 D1
43 23.6 OJ 1.0 D1
44 26.4 OJ 1.0 D1
45 20.0 OJ 1.0 D1
46 25.2 OJ 1.0 D1
47 25.8 OJ 1.0 D1
48 21.2 OJ 1.0 D1
49 14.5 OJ 1.0 D1
50 27.3 OJ 1.0 D1
51 25.5 OJ 2.0 D2
52 26.4 OJ 2.0 D2
53 22.4 OJ 2.0 D2
54 24.5 OJ 2.0 D2
55 24.8 OJ 2.0 D2
56 30.9 OJ 2.0 D2
57 26.4 OJ 2.0 D2
58 27.3 OJ 2.0 D2
59 29.4 OJ 2.0 D2
60 23.0 OJ 2.0 D2
Find categorical indicator vector based on continuous thresholds
An easy option is findInterval
categories2 <- findInterval(y, t)
all.equal(categories, categories2)
#[1] TRUE
Related Topics
How to Change Legend Title in Ggplot
Split String Column to Create New Binary Columns
Generate List of All Possible Combinations of Elements of Vector
What Does "The Following Object Is Masked from 'Package:Xxx'" Mean
How to Plot All the Columns of a Data Frame in R
Concatenate a Vector of Strings/Character
Duplicate Columns in Spark Dataframe
How to Convert Variable With Mixed Date Formats to One Format
Sum Rows in Data.Frame or Matrix
Select the Top N Values by Group
Data.Table VS Dplyr: Can One Do Something Well the Other Can't or Does Poorly
Create a Sequential Number (Counter) For Rows Within Each Group of a Dataframe
Rotating and Spacing Axis Labels in Ggplot2
Looping Over a Date or Posixct Object Results in a Numeric Iterator
Get the Difference Between Dates in Terms of Weeks, Months, Quarters, and Years
What Exactly Is Copy-On-Modify Semantics in R, and Where Is the Canonical Source