How to Create Binned Factor Variables from a Continuous Variable, with Custom Breaks

How do I create binned factor variables from a continuous variable, with custom breaks?

Use cut:

data.frame(dataset, bin=cut(dataset, c(1,4,9,17,23), include.lowest=TRUE))

Categorize numeric variable into group/ bins/ breaks

I would use findInterval() here:

First, make up some sample data

set.seed(1)
ages <- floor(runif(20, min = 20, max = 50))
ages
# [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43

Use findInterval() to categorize your "ages" vector.

findInterval(ages, c(20, 30, 40))
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3

Alternatively, as recommended in the comments, cut() is also useful here:

cut(ages, breaks=c(20, 30, 40, 50), right = FALSE)
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)

Splitting a continuous variable into equal sized groups

try this:

split(das, cut(das$anim, 3))

if you want to split based on the value of wt, then

library(Hmisc) # cut2
split(das, cut2(das$wt, g=3))

anyway, you can do that by combining cut, cut2 and split.

UPDATED

if you want a group index as an additional column, then

das$group <- cut(das$anim, 3)

if the column should be index like 1, 2, ..., then

das$group <- as.numeric(cut(das$anim, 3))

UPDATED AGAIN

try this:

> das$wt2 <- as.numeric(cut2(das$wt, g=3))
> das
anim wt wt2
1 1 181.0 1
2 2 179.0 1
3 3 180.5 1
4 4 201.0 2
5 5 201.5 2
6 6 245.0 2
7 7 246.4 3
8 8 189.3 1
9 9 301.0 3
10 10 354.0 3
11 11 369.0 3
12 12 205.0 2
13 13 199.0 1
14 14 394.0 3
15 15 231.3 2

Syntax for ifelse() Function in R Please

You can using between and case_when from dplyr package as:

 library(dplyr)

old_images <- old_images %>% mutate(Quarter = case_when(
between(difference, 1, 279) ~ 1,
between(difference, 280, 558) ~ 2,
between(difference, 559, 837) ~ 3,
between(difference, 838, 115) ~ 4,
TRUE ~ 4
))

Specifying bin range values for continuous data in R

cut is the way to go for this problem. If you do not like the output with the brackets, you can use some data manipulation to get it to look the way you'd like.

bins <- seq(0, 15000, by=250)
Amount2 <- as.numeric(gsub("\\$|,", "", df$Amount))
labels <- gsub("(?<!^)(\\d{3})$", ",\\1", bins, perl=T)
rangelabels <- paste(head(labels,-1), tail(labels,-1), sep="-")
df$Bin <- cut(Amount2, bins, rangelabels)

We first create a sequence from 0 to 15,000 by 250. Next we format the Amount column by eliminating the dollar signs and commas and save to the variable Amount2. We then format the output labels by inserting commas after the first three digits. We will use that variable in the final Bin column.

The variable rangelabels combines the bin break-points with a hyphen. The main function is next, cut(Amount2, bins, rangelabels). The first argument, Amount2 is the data frame vector being cut. The second argument, bins supplies the breaks for the intervals. The last argument, rangelabels is the vector of names for the output resulting in:

df
TranID Amount Bin
1 135 $249.22 0-250
2 138 $1,022.01 1,000-1,250
3 155 $10,350.11 10,250-10,500


Related Topics



Leave a reply



Submit