How to Do Range Grouping on a Column Using Dplyr

How to do range grouping on a column using dplyr?

We can use cut to do the grouping. We create the 'gr' column within the group_by, use summarise to create the number of elements in each group (n()), and order the output (arrange) based on 'gr'.

library(dplyr)
 DT %>% 
     group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05)) ) %>% 
     summarise(n= n()) %>%
     arrange(as.numeric(gr))

As the initial object is data.table, this can be done using data.table methods (included @Frank's suggestion to use keyby)

library(data.table)
DT[,.N , keyby = .(gr=cut(B, breaks=seq(0, 1, by=0.05)))]

EDIT:

Based on the update in the OP's post, we could substract a small number to the seq

lvls <- levels(cut(DT$B, seq(0, 1, by =0.05)))
DT %>%
   group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05) -
                 .Machine$double.eps, right=FALSE, labels=lvls)) %>% 
   summarise(n=n()) %>% 
   arrange(as.numeric(gr))
#          gr n
#1   (0,0.05] 2
#2 (0.05,0.1] 2
#3 (0.1,0.15] 3
#4 (0.15,0.2] 2
#5 (0.7,0.75] 1

R and dplyr: group by value ranges

You can use cut() to create a grouping variable with which to summarise count.

library(dplyr)

df %>%
  group_by(grp = cut(value, c(-Inf, 2, 4, Inf))) %>%
  summarise(count = sum(count))

# A tibble: 3 x 2
  grp      count
  <fct>    <int>
1 (-Inf,2]    30
2 (2,4]       70
3 (4, Inf]   110

Group value in range r

Here is a full solution, including your sample data:

df <- data.frame(name=c("r", "h", "s", "l", "e", "m"), value=c(35,20,16,40,23,40))
# get categories
df$groups <- cut(df$value, breaks=c(0,21,30,Inf))

# calculate group counts:
table(cut(df$value, breaks=c(0,21,30,Inf)))

If Inf is a little too extreme, you can use max(df$value) instead.

Apply a summarise condition to a range of columns when using dplyr group_by?

The upcoming version 1.0.0 of dplyr will have across() function that does what you wish for

Basic usage

across() has two primary arguments:

The first argument, .cols, selects the columns you want to operate on.
It uses tidy selection (like select()) so you can pick variables by
position, name, and type.

The second argument, .fns, is a function or list of functions to apply to
each column. This can also be a purrr style formula (or list of formulas)
like ~ .x / 2. (This argument is optional, and you can omit it if you just want
to get the underlying data; you'll see that technique used in
vignette("rowwise").)

### Install development version on GitHub first
# install.packages("devtools")
# devtools::install_github("tidyverse/dplyr")
library(dplyr, warn.conflicts = FALSE)

Control how the names are created with the .names argument which takes a glue spec:

iris %>% 
  group_by(Species) %>% 
  summarise(
    across(c(Sepal.Width:Petal.Width), ~ mean(.x, na.rm = TRUE), .names = "mean_{col}"),
    across(c(Sepal.Length), ~ max(.x, na.rm = TRUE), .names = "max_{col}")
    )
#> # A tibble: 3 x 5
#>   Species    mean_Sepal.Width mean_Petal.Leng~ mean_Petal.Width max_Sepal.Length
#> * <fct>                 <dbl>            <dbl>            <dbl>            <dbl>
#> 1 setosa                 3.43             1.46            0.246              5.8
#> 2 versicolor             2.77             4.26            1.33               7  
#> 3 virginica              2.97             5.55            2.03               7.9

Using multiple functions

my_func <- list(
  mean = ~ mean(., na.rm = TRUE),
  max  = ~ max(., na.rm = TRUE)
)

iris %>%
  group_by(Species) %>%
  summarise(across(where(is.numeric), my_func, .names = "{fn}.{col}"))
#> # A tibble: 3 x 9
#>   Species    mean.Sepal.Length max.Sepal.Length mean.Sepal.Width max.Sepal.Width
#> * <fct>                  <dbl>            <dbl>            <dbl>           <dbl>
#> 1 setosa                  5.01              5.8             3.43             4.4
#> 2 versicolor              5.94              7               2.77             3.4
#> 3 virginica               6.59              7.9             2.97             3.8
#>   mean.Petal.Length max.Petal.Length mean.Petal.Width max.Petal.Width
#> *             <dbl>            <dbl>            <dbl>           <dbl>
#> 1              1.46              1.9            0.246             0.6
#> 2              4.26              5.1            1.33              1.8
#> 3              5.55              6.9            2.03              2.5

^{Created on 2020-03-06 by the reprex package (v0.3.0)}

Using dplyr to select a range based on a grouping variable in a separate data.frame

Here is an option with Map

res1 <- do.call(rbind, Map(function(x, y, z) 
   data.frame(foo[x:y,], ID = as.character(z), stringsAsFactors = FALSE),
     findInterval(bar$xMin, foo$x),
        findInterval(bar$xMax, foo$x), bar$ID))
all.equal(res1, res)
#[1] TRUE

Or using data.table

library(data.table)
setDT(foo)[bar,  on = .(x >= xMin, x <= xMax)]

Or using tidyverse

library(dplyr)
library(purrr)
library(tidyr)
bar %>% 
    transmute(ID, col1 = map2(findInterval(xMin, foo$x),  
                               findInterval(xMax, foo$x),  ~
                         foo %>% slice(.x:.y))) %>% 
    unnest(c(col1))

Create a column in R to compare values within a group and flag as greater than (1), less than (0) or equal (2)

df %>%
  group_by(Round) %>%
  mutate( Flag1 = replace(rank(Score) - 1, length(unique(Score)) == 1, 2))

  Round Team  Score  Flag Flag1
  <int> <chr> <int> <int> <dbl>
1     1 Team1     4     0     0
2     1 Team2     8     1     1
3     2 Team1     9     1     1
4     2 Team2     2     0     0
5     3 Team1     6     2     2
6     3 Team2     6     2     2
7     4 Team1    14     1     1
8     4 Team2     9     0     0

R create new column based on data range at a certain time point

Instead of if_else nested, we could use case_when where we can have multiple conditions created, then do a group_by with 'Patient' and fill the 'Value_status' NA elements with the previous non-NA values

library(dplyr)
library(tidyr)
tb %>%
    mutate(Value_status = case_when(Time == 1 & Value < 50 ~ "low",
                        Time == 1 & Value >= 50 ~ "high"
                        )) %>%
   group_by(Patient) %>%
   fill(Value_status) %>%
   ungroup

-outupt

# A tibble: 15 x 5
   RowID Patient  Time Value Value_status
   <chr> <chr>   <dbl> <dbl> <chr>       
 1 A1    001         1  NA   <NA>        
 2 A2    001         2  10   <NA>        
 3 A3    001         3  23   <NA>        
 4 A4    002         1 100   high        
 5 A5    002         2  30   high        
 6 A6    035         1  10   low         
 7 A7    035         2  15   low         
 8 A8    035         3  NA   low         
 9 A9    035         4  60   low         
10 A10   035         5  56.7 low         
11 A11   100         1  30   low         
12 A12   100         2  51   low         
13 A13   105         1   3   low         
14 A14   105         2  13   low         
15 A15   105         3  77   low

How to find the range of dates for each group in a dataframe

I would suggest an approach using first() and last() functions from dplyr package:

library(dplyr)
#Data
data <- data.frame(group = rep(letters[1:3], c(4,5,4)),
                   Date = as.Date(c("2010-08-09", "2010-09-11", "2010-09-12", "2010-09-18",
                                    "2014-03-15","2014-03-16","2014-03-20","2014-03-21","2014-03-25",
                                    "2016-05-02","2016-08-02","2016-08-03","2016-09-21")))
#Code
data %>% group_by(group) %>% mutate(FirsDate=first(Date),LastDate=last(Date))

Output:

# A tibble: 13 x 4
# Groups:   group [3]
   group Date       FirsDate   LastDate  
   <fct> <date>     <date>     <date>    
 1 a     2010-08-09 2010-08-09 2010-09-18
 2 a     2010-09-11 2010-08-09 2010-09-18
 3 a     2010-09-12 2010-08-09 2010-09-18
 4 a     2010-09-18 2010-08-09 2010-09-18
 5 b     2014-03-15 2014-03-15 2014-03-25
 6 b     2014-03-16 2014-03-15 2014-03-25
 7 b     2014-03-20 2014-03-15 2014-03-25
 8 b     2014-03-21 2014-03-15 2014-03-25
 9 b     2014-03-25 2014-03-15 2014-03-25
10 c     2016-05-02 2016-05-02 2016-09-21
11 c     2016-08-02 2016-05-02 2016-09-21
12 c     2016-08-03 2016-05-02 2016-09-21
13 c     2016-09-21 2016-05-02 2016-09-21

If you just want the variables by each group you can use summarise():

#Code2
data %>% group_by(group) %>% summarise(FirsDate=first(Date),LastDate=last(Date))

Output:

# A tibble: 3 x 3
  group FirsDate   LastDate  
  <fct> <date>     <date>    
1 a     2010-08-09 2010-09-18
2 b     2014-03-15 2014-03-25
3 c     2016-05-02 2016-09-21

Update:

#Code
data2 %>% group_by(group) %>% summarise(FirsDate=min(Date),LastDate=max(Date))

Output:

# A tibble: 3 x 3
  group FirsDate   LastDate  
  <fct> <date>     <date>    
1 a     2010-08-09 2010-09-18
2 b     2014-03-15 2014-03-25
3 c     2016-05-02 2016-09-21

R output BOTH maximum and minimum value by group in dataframe

You can use range to get max and min value and use it in summarise to get different rows for each Name.

library(dplyr)

df %>%
  group_by(Name) %>%
  summarise(Value = range(Value), .groups = "drop")

#  Name  Value
#  <chr> <int>
#1 A        27
#2 A        57
#3 B        20
#4 B        89
#5 C        58
#6 C        97

If you have large dataset using data.table might be faster.

library(data.table)
setDT(df)[, .(Value = range(Value)), Name]

How to Do Range Grouping on a Column Using Dplyr