How do I create binned factor variables from a continuous variable, with custom breaks?
Use cut
:
data.frame(dataset, bin=cut(dataset, c(1,4,9,17,23), include.lowest=TRUE))
Categorize numeric variable into group/ bins/ breaks
I would use findInterval()
here:
First, make up some sample data
set.seed(1)
ages <- floor(runif(20, min = 20, max = 50))
ages
# [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43
Use findInterval()
to categorize your "ages" vector.
findInterval(ages, c(20, 30, 40))
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3
Alternatively, as recommended in the comments, cut()
is also useful here:
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE)
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)
Splitting a continuous variable into equal sized groups
try this:
split(das, cut(das$anim, 3))
if you want to split based on the value of wt
, then
library(Hmisc) # cut2
split(das, cut2(das$wt, g=3))
anyway, you can do that by combining cut
, cut2
and split
.
UPDATED
if you want a group index as an additional column, then
das$group <- cut(das$anim, 3)
if the column should be index like 1, 2, ..., then
das$group <- as.numeric(cut(das$anim, 3))
UPDATED AGAIN
try this:
> das$wt2 <- as.numeric(cut2(das$wt, g=3))
> das
anim wt wt2
1 1 181.0 1
2 2 179.0 1
3 3 180.5 1
4 4 201.0 2
5 5 201.5 2
6 6 245.0 2
7 7 246.4 3
8 8 189.3 1
9 9 301.0 3
10 10 354.0 3
11 11 369.0 3
12 12 205.0 2
13 13 199.0 1
14 14 394.0 3
15 15 231.3 2
Syntax for ifelse() Function in R Please
You can using between
and case_when
from dplyr
package as:
library(dplyr)
old_images <- old_images %>% mutate(Quarter = case_when(
between(difference, 1, 279) ~ 1,
between(difference, 280, 558) ~ 2,
between(difference, 559, 837) ~ 3,
between(difference, 838, 115) ~ 4,
TRUE ~ 4
))
Specifying bin range values for continuous data in R
cut
is the way to go for this problem. If you do not like the output with the brackets, you can use some data manipulation to get it to look the way you'd like.
bins <- seq(0, 15000, by=250)
Amount2 <- as.numeric(gsub("\\$|,", "", df$Amount))
labels <- gsub("(?<!^)(\\d{3})$", ",\\1", bins, perl=T)
rangelabels <- paste(head(labels,-1), tail(labels,-1), sep="-")
df$Bin <- cut(Amount2, bins, rangelabels)
We first create a sequence from 0 to 15,000 by 250. Next we format the Amount
column by eliminating the dollar signs and commas and save to the variable Amount2
. We then format the output labels by inserting commas after the first three digits. We will use that variable in the final Bin
column.
The variable rangelabels
combines the bin break-points with a hyphen. The main function is next, cut(Amount2, bins, rangelabels)
. The first argument, Amount2
is the data frame vector being cut. The second argument, bins
supplies the breaks for the intervals. The last argument, rangelabels
is the vector of names for the output resulting in:
df
TranID Amount Bin
1 135 $249.22 0-250
2 138 $1,022.01 1,000-1,250
3 155 $10,350.11 10,250-10,500
Related Topics
How to Extract All the Rows If a Level in One Column Contains All the Levels of Another Column in R
Selection of Activity Trace in a Chart and Display in a Data Table in R Shiny
How to Play Birthday Music Using R
Shift Legend into Empty Facets of a Faceted Plot in Ggplot2
Convert a Row of a Data Frame to Vector
Get the Path of Current Script
How to Increase the Size of Points in Legend of Ggplot2
Given a Set of Random Numbers Drawn from a Continuous Univariate Distribution, Find the Distribution
Importing Data into R from Google Spreadsheet
Dplyr - Summary Table for Multiple Variables
Monitoring for Changes in File(S) in Real Time
Programmatically Insert Text, Headers and Lists with R Markdown
How to Pass Strings Denoting Expressions to Dplyr 0.7 Verbs
How to Create, Structure, Maintain and Update Data Codebooks in R