Dplyr: How to Apply Do() on Result of Group_By

dplyr: How to apply do() on result of group_by?

Let us define eaten like this:

eaten <- data.frame(person, foods, stringsAsFactors = FALSE)

1) Then try this:

eaten %.% group_by(person) %.% do(function(x) combn(x$foods, m = 2))

giving:

[[1]]
     [,1]     [,2]       [,3]      
[1,] "apple"  "apple"    "banana"  
[2,] "banana" "cucumber" "cucumber"

[[2]]
     [,1]        [,2]        [,3]      
[1,] "spaghetti" "spaghetti" "cucumber"
[2,] "cucumber"  "banana"    "banana"

2) To be able to do something near to what @Hadley describes in the comments without waiting for a future version of dplyr try this where do2 is found here:

library(gsubfn)
eaten %.% group_by(person) %.% fn$do2(~ combn(.$foods, m = 2))

giving:

$Grace
     [,1]     [,2]       [,3]      
[1,] "apple"  "apple"    "banana"  
[2,] "banana" "cucumber" "cucumber"

$Rob
     [,1]        [,2]        [,3]      
[1,] "spaghetti" "spaghetti" "cucumber"
[2,] "cucumber"  "banana"    "banana"

Note: The last line of the question giving the code in the help file also fails for me. This variation of it works for me: do(jan, lm, formula = ArrDelay ~ date) .

R - use group_by() and mutate() in dplyr to apply function that returns a vector the length of groups

How about making use of nest instead:

foo %>%
    group_by(fac) %>%
    nest() %>%
    mutate(mahal = map(data, ~mahalanobis(
        .x,
        center = colMeans(.x, na.rm = T),
        cov = cov(.x, use = "pairwise.complete.obs")))) %>%
    unnest()
## A tibble: 10 x 4
#   fac   mahal      x       y
#   <fct> <dbl>  <dbl>   <dbl>
# 1 A     1.02   -6.26  15.1
# 2 A     0.120   1.84   3.90
# 3 A     2.81   -8.36  -6.21
# 4 A     2.84   16.0  -22.1
# 5 A     1.21    3.30  11.2
# 6 B     2.15   -8.20  -0.449
# 7 B     2.86    4.87  -0.162
# 8 B     1.23    7.38   9.44
# 9 B     0.675   5.76   8.21
#10 B     1.08   -3.05   5.94

Here you avoid an explicit "x", "y" filter of the form temp <- x[, c("x", "y")], as you nest relevant columns after grouping by fac. Applying mahalanobis is then straight-forward.

Update

To respond to your comment, here is a purrr option. Since it's easy to loose track of what's going on, let's go step-by-step:

Generate sample data with one additional column.

set.seed(1)
foo <- data.frame(
    x = rnorm(10, 0, 10),
    y = rnorm(10, 0, 10),
    z = rnorm(10, 0, 10),
    fac = c(rep("A", 5), rep("B", 5)))

We now store the columns which define the subset of the data to be used for the calculation of the Mahalanobis distance in a list
```
cols <- list(cols1 = c("x", "y"), cols2 = c("y", "z"))
```
So we will calculate the Mahalanobis distance (per fac) for the subset of data in columns x+y and then separately for y+z. The names of cols will be used as the column names of the two distance vectors.

Now for the actual purrr command:

imap_dfc(cols, ~nest(foo %>% group_by(fac), .x, .key = !!.y) %>% select(!!.y)) %>%
    mutate_all(function(lst) map(lst, ~mahalanobis(
        .x,
        center = colMeans(.x, na.rm = T),
        cov = cov(., use = "pairwise.complete.obs")))) %>%
    unnest() %>%
    bind_cols(foo, .)
#           x           y           z fac     cols1     cols2
#1  -6.264538  15.1178117   9.1897737   A 1.0197542 1.3608052
#2   1.836433   3.8984324   7.8213630   A 0.1199607 1.1141352
#3  -8.356286  -6.2124058   0.7456498   A 2.8059562 1.5099574
#4  15.952808 -22.1469989 -19.8935170   A 2.8401953 3.0675228
#5   3.295078  11.2493092   6.1982575   A 1.2141337 0.9475794
#6  -8.204684  -0.4493361  -0.5612874   B 2.1517055 1.2284793
#7   4.874291  -0.1619026  -1.5579551   B 2.8626501 1.1724828
#8   7.383247   9.4383621 -14.7075238   B 1.2271316 2.5723023
#9   5.757814   8.2122120  -4.7815006   B 0.6746788 0.6939081
#10 -3.053884   5.9390132   4.1794156   B 1.0838341 2.3328276

In short, we

loop over entries in cols,
nest data in foo per fac based on columns defined in cols,
apply mahalanobis on the nested and grouped data generating as many distance columns with nested data as we have entries in cols (i.e. subsets), and
finally unnest the distance data and column-bind it to the original foo data.

using dplyr::group_by in a function within apply

You should apply using the colnames(dat) to get the correct groupings:

dat <- mtcars[c(2:4,11)]

grp <- function(x) {
  group_by(dat,!!as.name(x)) %>%
  summarise(n=n()) %>% 
  mutate(pc=scales::percent(n/sum(n))) %>% 
  arrange(desc(n)) %>% head()
}

lapply(colnames(dat), grp)

How to use group_by with mean and sum in dplyr?

If I understood correctly, this might help you

#Libraries

library(tidyverse)
library(lubridate)

#Data

df <-
 tibble::tribble(
    ~Year, ~School.Name, ~Student.Score1, ~Student.Score2,
    2019L,      "ISD 1",             1L,             NA,
    2020L,      "ISD 4",             4L,             2L,
    2020L,      "ISD 3",             NA,             3L,
    2018L,      "ISD 1",             4L,             NA,
    2019L,      "ISD 4",             2L,             5L,
    2020L,      "ISD 4",             3L,             2L,
    2019L,      "ISD 3",             NA,             1L,
    2018L,      "ISD 1",             2L,             4L
    )

#How to

df %>% 
  group_by(Year,School.Name) %>% 
  summarise(
    n = n(),
    across(.cols = contains(".Score"),.fns = function(x)mean(x,na.rm = TRUE))
  )

# A tibble: 6 x 5
# Groups:   Year [3]
   Year School.Name     n Student.Score1 Student.Score2
  <int> <chr>       <int>          <dbl>          <dbl>
1  2018 ISD 1           2            3                4
2  2019 ISD 1           1            1              NaN
3  2019 ISD 3           1          NaN                1
4  2019 ISD 4           1            2                5
5  2020 ISD 3           1          NaN                3
6  2020 ISD 4           2            3.5              2

How to use dplyr::group_by in a function

You can use group_by_at and column index such as:

countString <- function(things) {
  index <- which(colnames(theTibble) %in% things)
  theTibble %>% 
       group_by_at(index) %>% 
       count()
}

countString(c("animal", "sex"))

## A tibble: 4 x 3
## Groups:   animal, sex [4]
#  animal sex        nn
#  <chr>  <chr>   <int>
#1 cat    f           2
#2 dog    f           1
#3 dog    m           2
#4 fish   unknown     1

applying a function to the output of dplyr's group_by

Well, you have a parenthesis problem and a file naming problem so maybe it's one of those that you are referring to. I'm assuming

iris %>%
  group_by(Species) %>%
  do({
      p <- ggplot(., aes(x=Sepal.Length, y=Sepal.Width)) + geom_point()
      ggsave(p, filename=paste0(unique(.$Species),".pdf"))
     })

would fix your problem.

R, dplyr - combination of group_by() and arrange() does not produce expected result?

I think you want

ToothGrowth %>%
  arrange(supp,len)

The chaining system just replaces nested commands, so first you are grouping, then ordering that grouped result, which breaks the original ordering.

How to apply a function per group in dplyr without having to define a function?

You can define the correct order, use match to get position of v2 and diff to calculate the difference of their occurrence in each v1. Make res as TRUE if the order matches.

library(dplyr)
correct_order = c('X', 'Y')

d %>%
 group_by(v1) %>%
 summarise(res = all(diff(match(correct_order, v2)) > 0))

#  v1    res  
#  <chr> <lgl>
#1 a     TRUE 
#2 b     FALSE

dplyr summarise : Group by multiple variables in a loop and add results in the same dataframe

library(questionr)
library(tidyverse)
data(hdv2003)

list("trav.satisf", "cuisine", "sexe") %>%
  map(~ {
    hdv2003 %>%
      group_by_at(.x) %>%
      summarise(
        n = n(),
        percent = round((n() / nrow(hdv2003)) * 100, digits = 1),
        femmes = round((sum(sexe == "Femme", na.rm = TRUE) / sum(!is.na(sexe))) * 100, digits = 1),
        age = round(mean(age, na.rm = TRUE), digits = 1)
      ) %>%
      rename_at(1, ~"group") %>%
      mutate(grouping = .x)
  }) %>%
  bind_rows() %>%
  select(grouping, group, everything())
#> # A tibble: 8 x 6
#>   grouping    group              n percent femmes   age
#>   <chr>       <fct>          <int>   <dbl>  <dbl> <dbl>
#> 1 trav.satisf Satisfaction     480    24     51.5  41.4
#> 2 trav.satisf Insatisfaction   117     5.9   47.9  40.3
#> 3 trav.satisf Equilibre        451    22.6   49.9  40.9
#> 4 trav.satisf <NA>             952    47.6   60.2  56  
#> 5 cuisine     Non             1119    56     43.8  50.1
#> 6 cuisine     Oui              881    44     69.4  45.6
#> 7 sexe        Homme            899    45      0    48.2
#> 8 sexe        Femme           1101    55    100    48.2

^{Created on 2021-11-12 by the reprex package (v2.0.1)}

Dplyr: How to Apply Do() on Result of Group_By