Categorize numeric variable into group/ bins/ breaks
I would use findInterval()
here:
First, make up some sample data
set.seed(1)
ages <- floor(runif(20, min = 20, max = 50))
ages
# [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43
Use findInterval()
to categorize your "ages" vector.
findInterval(ages, c(20, 30, 40))
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3
Alternatively, as recommended in the comments, cut()
is also useful here:
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE)
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)
Categorize numeric variable with mutate
set.seed(123)
df <- data.frame(a = rnorm(10), b = rnorm(10))
df %>% mutate(a = cut(a, breaks = quantile(a, probs = seq(0, 1, 0.2))))
giving:
a b
1 (-0.586,-0.316] 1.2240818
2 (-0.316,0.094] 0.3598138
3 (0.68,1.72] 0.4007715
4 (-0.316,0.094] 0.1106827
5 (0.094,0.68] -0.5558411
6 (0.68,1.72] 1.7869131
7 (0.094,0.68] 0.4978505
8 <NA> -1.9666172
9 (-1.27,-0.586] 0.7013559
10 (-0.586,-0.316] -0.4727914
R categorize numeric value using case_when
We could use cut
function:
library(dplyr)
labels <- c("1 km", "10 km", "20 km", "50 km")
data %>%
mutate(within_km = cut(distance_km,
breaks = c(0, 1, 10, 20, 50),
labels = labels))
id distance_km within_km
<chr> <dbl> <fct>
1 1 0.5 1 km
2 2 1.5 10 km
3 3 10.5 20 km
4 4 43 50 km
5 5 20.7 50 km
Splitting a continuous variable into equal sized groups
try this:
split(das, cut(das$anim, 3))
if you want to split based on the value of wt
, then
library(Hmisc) # cut2
split(das, cut2(das$wt, g=3))
anyway, you can do that by combining cut
, cut2
and split
.
UPDATED
if you want a group index as an additional column, then
das$group <- cut(das$anim, 3)
if the column should be index like 1, 2, ..., then
das$group <- as.numeric(cut(das$anim, 3))
UPDATED AGAIN
try this:
> das$wt2 <- as.numeric(cut2(das$wt, g=3))
> das
anim wt wt2
1 1 181.0 1
2 2 179.0 1
3 3 180.5 1
4 4 201.0 2
5 5 201.5 2
6 6 245.0 2
7 7 246.4 3
8 8 189.3 1
9 9 301.0 3
10 10 354.0 3
11 11 369.0 3
12 12 205.0 2
13 13 199.0 1
14 14 394.0 3
15 15 231.3 2
Create 4 categories variables
I may be misunderstanding something, but you appear to have overlapping categories- Total >= 2 is basic, but Total < 3 is good. You may want to confirm the bounds for your groupings. Once that's sorted, you were actually pretty close to a working solution- you can nest ifelse
statements and consider that they are evaluated in order. So, if a condition evaluates to TRUE
"early" in the chain, it will return whatever is the output for a TRUE
response at that point. Otherwise, it will move to the next ifelse
to evaluate. Note here that I've used 1, 2, and 3 as the 'breaks' for the categories, so that the logic evaluates to: "If it's less than 1, it's Limited. If it's less than 2, it's Basic. If it's less than 3, it's good. Otherwise, it's Full."
set.seed(123)
df <- data.frame(total = runif(n = 15, min = 0, max = 4))
df
df$level = ifelse(df$total < 1, "Limited",
ifelse(df$total < 2, "Basic",
ifelse(df$total < 3, "Good", "Full")))
> df
total level
1 0.5691772 Limited
2 2.1971386 Good
3 3.8163650 Full
4 2.3419334 Good
5 1.6180411 Basic
6 2.5915739 Good
7 1.2792825 Basic
8 1.2308800 Basic
9 0.8790705 Limited
10 1.4779555 Basic
11 3.9368768 Full
12 0.6168092 Limited
13 0.3641760 Limited
14 0.5676276 Limited
15 2.7600284 Good
With just four categories an ifelse
block is probably fine- if I were using many more bounds I'd likely use a different approach Edit: like thelatemail's- it's far cleaner.
Related Topics
Coerce Multiple Columns to Factors At Once
Change Rows into Columns in R With Values Yes/No (1/0)
How to Generate the First N Terms in the Series:
Column Name Changes in R for Loop for Defined Data Frame
Installing Rgl on Ubuntu and Mac: X11 Not Found
How to Reshape Data from Long to Wide Format
How to Sum a Variable by Group
Dynamically Select Data Frame Columns Using $ and a Character Value
How to Convert a Factor to Integer\Numeric Without Loss of Information
Transpose/Reshape Dataframe Without "Timevar" from Long to Wide Format
Stratified Random Sampling from Data Frame
Can Lists Be Created That Name Themselves Based on Input Object Names
Rotating and Spacing Axis Labels in Ggplot2
How to Change Language Settings in R
Fixing the Order of Facets in Ggplot
Replace Missing Values (Na) With Most Recent Non-Na by Group