Binning a numeric variable
How about cut
:
binned.x <- cut(x, breaks = c(-1:9, Inf), labels = c(as.character(0:9), '10+'))
Which yields:
# [1] 0 1 3 4 2 4 2 5 10+ 10+ 10+ 2 10+ 2 10+ 3 4 2
# Levels: 0 1 2 3 4 5 6 7 8 9 10+
Automatically creating bins for a numeric variable in r
Your description sounds like you're wanting to plot a histogram of var
. This can be done easily enough in ggplot
using geom_histogram
. The key here is that ggplot
likes to have a data frame, so you just have to specify your variable in a dataframe first, which you can do inside the ggplot()
function:
ggplot(data.frame(var), aes(var)) + geom_histogram(color='black', alpha=0.2)
Gives you this:
The default is to use 30 bins, but you can specify either number of bins via bins=
or the size of the bins via binwidth=
:
ggplot(data.frame(var), aes(var)) + geom_histogram(bins=10, color='black', alpha=0.2)
If you want to plot the basic bar geom, then geom_histogram()
works just fine. If you change to use the stat_bin()
function instead, it will perform the same binning method, but then you can apply and use a different geom if you want to:
ggplot(data.frame(var), aes(var)) +
stat_bin(geom='area', bins=10, alpha=0.2, color='black')
If you're looking to grab just the numbers/data from "binning" a variable like you have, one of the simplest ways might be to use cut()
from dplyr
.
Use of cut()
is pretty simple. You specify the vector and a breaks=
argument. Breaks can be specified a list of places where you want to "cut" your data (or "bin" your data), or you can just set breaks=10
and it will give you an evenly cut set of 10 bins. The result is a factor
with levels=
that correspond to the range for each of the breaks. In the case of var
with breaks=10
, you get the following:
> var_cut <- cut(var, breaks = 10)
> levels(var_cut)
[1] "(-0.365,36.5]" "(36.5,73]" "(73,110]" "(110,146]" "(146,182]" "(182,219]" "(219,256]"
[8] "(256,292]" "(292,328]" "(328,365]"
Categorize numeric variable into group/ bins/ breaks
I would use findInterval()
here:
First, make up some sample data
set.seed(1)
ages <- floor(runif(20, min = 20, max = 50))
ages
# [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43
Use findInterval()
to categorize your "ages" vector.
findInterval(ages, c(20, 30, 40))
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3
Alternatively, as recommended in the comments, cut()
is also useful here:
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE)
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)
Efficiently Binning Data into specified bins with dplyr
fuzzyjoin
implements dplyr
range/interval joins:
library(fuzzyjoin)
interval_left_join(
FJX_bins,
test_spectra,
by = c('Wavelength' = 'Lambda_Start', 'Wavelength' = 'Lambda_End')
)
# A tibble: 52 x 5
Wavelength Sigma Bin_Number Lambda_Start Lambda_End
<int> <dbl> <int> <dbl> <dbl>
1 289 3.98e-20 1 289 298.
2 290 3.89e-20 1 289 298.
3 291 3.77e-20 1 289 298.
4 292 3.64e-20 1 289 298.
5 293 3.54e-20 1 289 298.
6 294 3.39e-20 1 289 298.
7 295 3.25e-20 1 289 298.
8 296 3.09e-20 1 289 298.
9 297 2.93e-20 1 289 298.
10 298 2.80e-20 1 289 298.
# … with 42 more rows
How do I bin a variable across a number of observations for each specimen?
You can use cut
to divide the data into categories, complete
the sequence and get data in wide format using pivot_wider
.
library(dplyr)
library(tidyr)
df %>%
count(Industry, Logo, Hue = cut(Hue, breaks, labels)) %>%
complete(Industry, Hue = labels, fill = list(n = 0)) %>%
fill(Logo) %>%
arrange(match(Hue, labels)) %>%
pivot_wider(names_from = Hue, values_from = n)
# Industry Logo `[0-45)` `[45-90)` `[90-135)` `[135-180)` `[180-225)` `[225-270)` `[270-315)` `[315-360)`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Fossil Petrox 3 0 0 0 2 0 0 0
#2 Renewable Windo 1 0 0 0 0 0 1 1
Related Topics
R: Robust Se's and Model Diagnostics in Stargazer Table
How to Call External R Script from R Markdown (.Rmd) in Rstudio
R: Reorder Facet_Wrapped X-Axis with Free_X in Ggplot2
Adding Lists Names as Plot Titles in Lapply Call in R
How to Manually Fill Colors in a Ggplot2 Histogram
Ggplot2 Avoid Boxes Around Legend Symbols
Remove Fill Around Legend Key in Ggplot
Ggplot: Remove Na Factor Level in Legend
R Dpylr Select_If with Multiple Conditions
Plot Circle with a Certain Radius Around Point on a Map in Ggplot2
How to Convert a String in a Function into an Object
How to Open an .Xlsb File in R
How to Create an Edge List from a Matrix in R
Override Column Types When Importing Data Using Readr::Read_Csv() When There Are Many Columns
R Shiny Checkboxgroupinput - Select All Checkboxes by Click
Remove Plot Margins in Ggplot2