Binning a Numeric Variable

Binning a numeric variable

How about cut:

binned.x <- cut(x, breaks = c(-1:9, Inf), labels = c(as.character(0:9), '10+'))

Which yields:

 # [1] 0   1   3   4   2   4   2   5   10+ 10+ 10+ 2   10+ 2   10+ 3   4   2  
 # Levels: 0 1 2 3 4 5 6 7 8 9 10+

Automatically creating bins for a numeric variable in r

Your description sounds like you're wanting to plot a histogram of var. This can be done easily enough in ggplot using geom_histogram. The key here is that ggplot likes to have a data frame, so you just have to specify your variable in a dataframe first, which you can do inside the ggplot() function:

ggplot(data.frame(var), aes(var)) + geom_histogram(color='black', alpha=0.2)

Gives you this:

Sample Image

The default is to use 30 bins, but you can specify either number of bins via bins= or the size of the bins via binwidth=:

ggplot(data.frame(var), aes(var)) + geom_histogram(bins=10, color='black', alpha=0.2)

Sample Image

If you want to plot the basic bar geom, then geom_histogram() works just fine. If you change to use the stat_bin() function instead, it will perform the same binning method, but then you can apply and use a different geom if you want to:

ggplot(data.frame(var), aes(var)) +
  stat_bin(geom='area', bins=10, alpha=0.2, color='black')

Sample Image

If you're looking to grab just the numbers/data from "binning" a variable like you have, one of the simplest ways might be to use cut() from dplyr.

Use of cut() is pretty simple. You specify the vector and a breaks= argument. Breaks can be specified a list of places where you want to "cut" your data (or "bin" your data), or you can just set breaks=10 and it will give you an evenly cut set of 10 bins. The result is a factor with levels= that correspond to the range for each of the breaks. In the case of var with breaks=10, you get the following:

> var_cut <- cut(var, breaks = 10)
> levels(var_cut)
 [1] "(-0.365,36.5]" "(36.5,73]"     "(73,110]"      "(110,146]"     "(146,182]"     "(182,219]"     "(219,256]"    
 [8] "(256,292]"     "(292,328]"     "(328,365]"

Categorize numeric variable into group/ bins/ breaks

I would use findInterval() here:

First, make up some sample data

set.seed(1)
ages <- floor(runif(20, min = 20, max = 50))
ages
# [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43

Use findInterval() to categorize your "ages" vector.

findInterval(ages, c(20, 30, 40))
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3

Alternatively, as recommended in the comments, cut() is also useful here:

cut(ages, breaks=c(20, 30, 40, 50), right = FALSE)
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)

Efficiently Binning Data into specified bins with dplyr

fuzzyjoin implements dplyr range/interval joins:

library(fuzzyjoin)

interval_left_join(
    FJX_bins, 
    test_spectra,
    by = c('Wavelength' = 'Lambda_Start', 'Wavelength' = 'Lambda_End')
)

# A tibble: 52 x 5
   Wavelength    Sigma Bin_Number Lambda_Start Lambda_End
        <int>    <dbl>      <int>        <dbl>      <dbl>
 1        289 3.98e-20          1          289       298.
 2        290 3.89e-20          1          289       298.
 3        291 3.77e-20          1          289       298.
 4        292 3.64e-20          1          289       298.
 5        293 3.54e-20          1          289       298.
 6        294 3.39e-20          1          289       298.
 7        295 3.25e-20          1          289       298.
 8        296 3.09e-20          1          289       298.
 9        297 2.93e-20          1          289       298.
10        298 2.80e-20          1          289       298.
# … with 42 more rows

How do I bin a variable across a number of observations for each specimen?

You can use cut to divide the data into categories, complete the sequence and get data in wide format using pivot_wider.

library(dplyr)  
library(tidyr)

  
df %>%
  count(Industry, Logo, Hue = cut(Hue, breaks, labels)) %>%
  complete(Industry, Hue = labels, fill = list(n = 0)) %>%
  fill(Logo) %>%
  arrange(match(Hue, labels)) %>%
  pivot_wider(names_from = Hue, values_from = n)

#   Industry  Logo   `[0-45)` `[45-90)` `[90-135)` `[135-180)` `[180-225)` `[225-270)` `[270-315)` `[315-360)`
#  <chr>     <chr>     <dbl>     <dbl>      <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#1 Fossil    Petrox        3         0          0           0           2           0           0           0
#2 Renewable Windo         1         0          0           0           0           0           1           1

Binning a Numeric Variable