Using Variable Column Names in Dplyr Summarise

How to use summarise from dplyr with dynamic column names?

1) Use !!sym(...) like this:

colOfInterest <- "Sepal.Length"
iris %>% 
  group_by(Species) %>%
  summarize(avg = mean(!!sym(colOfInterest))) %>%
  ungroup

giving:

# A tibble: 3 x 2
  Species      avg
  <fct>      <dbl>
1 setosa      5.01
2 versicolor  5.94
3 virginica   6.59

2) A second approach is:

colOfInterest <- "Sepal.Length"
iris %>% 
  group_by(Species) %>%
  summarize(avg = mean(.data[[colOfInterest]])) %>%
  ungroup

Of course this is straight forward in base R:

aggregate(list(avg = iris[[colOfInterest]]), iris["Species"], mean)

using variable column names in dplyr summarise

You can use base::get:

df %>% summarise(mean(get(x[1])) - mean(get(x[2])))

# # A tibble: 2 x 2
#        c `mean(a) - mean(b)`
#    <dbl>               <dbl>
# 1     1                  -1
# 2     2                  -1

get will search in current environment by default.

As the error message says, mean expects a logical or numeric object, as.name returns a name:

class(as.name("a")) # [1] "name"

You could evaluate your name, that would work as well :

df %>% summarise(mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2]))))
# # A tibble: 2 x 2
#       c `mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2])))`
#   <dbl>                                                   <dbl>
# 1     1                                                      -1
# 2     2                                                      -1

Dplyr Summarise Groups as Column Names

There's lots of ways to go about it, but I would simplify it by pivoting to a longer data frame initially, and then grouping by var and group. Then you can just pivot wider to get the final result you want. Note that I used summarize(across()) which replaces the deprecated summarize_all(), even though with a single column could've just manually specified Mean = ... and Sum = ....

set.seed(123)

test_df %>%
  pivot_longer(
    var1:var2,
    names_to = "var"
  ) %>%
  group_by(Group, var) %>%
  summarize(
    across(
      everything(),
      list(Mean = mean, Sum = sum),
      .names = "{.fn}"
    ),
    .groups = "drop"
  ) %>%
  pivot_wider(
    names_from = "Group",
    values_from = c(Mean, Sum),
    names_glue = "{Group}_{.value}"
  )
#> # A tibble: 2 × 7
#>   var   A_Mean B_Mean C_Mean A_Sum B_Sum C_Sum
#>   <chr>  <dbl>  <dbl>  <dbl> <int> <int> <int>
#> 1 var1       1    2.5    3.2     1    10    16
#> 2 var2       5    4.5    4.4     5    18    22

Variables to summarise data in dplyr and R statistics: Refer to column names stored as strings with the `.data` pronoun:

using the outline in the comments by Luis. Translating into my function:

summariseData <- function(df, column_to_summerise, target, kpi_target)
{
  column_to_summerise <- enquo(column_to_summerise)

  calc_df <- df %>% 
    group_by(Date_Received) %>% 
    dplyr:: summarise(med=median(!!column_to_summerise, na.rm = TRUE), 
                      per95=quantile(!!column_to_summerise, probs = kpi_target, na.rm = TRUE), 
                      In_Target = sum(!!column_to_summerise <= target, na.rm = TRUE), 
                      Out_Target = sum(!!column_to_summerise > target, na.rm = TRUE), 
                      Total_Data = n()) 
  return(calc_df)
}

Key point is to use the enquo() function prior to using !!

From (https://dplyr.tidyverse.org/articles/programming.html) By
analogy to strings, we don’t want "", instead we want some function
that turns an argument into a string. That’s the job of enquo().
enquo() uses some dark magic to look at the argument, see what the
user typed, and return that value as a quosure

In dplyr (and in tidyeval in general) you use !! to say that you want
to unquote an input so that it’s evaluated

R: How to summarize and group by variables as column names

We can use across from the new version of dplyr

library(dplyr)
df %>%
    group_by(across(colums_to_group)) %>% 
    summarise(across(all_of(columns_to_sum), sum, na.rm = TRUE), .groups = 'drop')
# A tibble: 2 x 3
#  A         B     C
#  <chr> <int> <int>
#1 X         6    21
#2 Y         9    19

In the previous version, we could use group_by_at along with summarise_at

df %>%
    group_by_at(colums_to_group) %>%
    summarise_at(vars(columns_to_sum), sum, na.rm = TRUE)

R dplyr how to select variables by column number rather than column name with summarise

Making use of the .data pronoun from rlang you could write a custom function which takes a dataframe, the names of two variables and some additional grouping variables and computes your desired summary table like so:

library(dplyr)
library(Hmisc)

summary_table <- function(.data, x, y, ...) {
  .data %>%
    group_by(...) %>%                                                    # Group species
    summarise(n = n(),                                                       # number of records                  
              WtMn = wtd.mean(.data[[x]], .data[[y]]),                    # weighted mean
              WtSd = sqrt(wtd.var(.data[[x]], .data[[y]])),               # weighted SD
              WtCV = WtMn/WtSd,                                              # weighted CV
              Minm = min(.data[[x]]),                                      # minumum
              Wp05 = wtd.quantile(.data[[x]], .data[[y]] , 0.05),         # p05
              Wp50 = wtd.quantile(.data[[x]], .data[[y]] , 0.50),         # p50
              Wp95 = wtd.quantile(.data[[x]], .data[[y]] , 0.95),         # p95 
              Wp975 = wtd.quantile(.data[[x]], .data[[y]] , 0.975),       # p975
              Wp99 = wtd.quantile(.data[[x]], .data[[y]] , 0.99),         # p99
              Maxm = max(.data[[x]])                                       # maximum
    )  
}

summary_table(iris, "Sepal.Length", "Petal.Width", Species)
#> # A tibble: 3 x 12
#>   Species        n  WtMn  WtSd  WtCV  Minm  Wp05  Wp50  Wp95 Wp975  Wp99  Maxm
#>   <fct>      <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa        50  5.05 0.356  14.2   4.3  4.61  5.06  5.62  5.70  5.72   5.8
#> 2 versicolor    50  5.98 0.508  11.8   4.9  5.13  6     6.80  6.97  7      7  
#> 3 virginica     50  6.61 0.626  10.6   4.9  5.8   6.5   7.7   7.7   7.9    7.9

summary_table(iris, "Sepal.Width", "Petal.Width", Species)
#> # A tibble: 3 x 12
#>   Species        n  WtMn  WtSd  WtCV  Minm  Wp05  Wp50  Wp95 Wp975  Wp99  Maxm
#>   <fct>      <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa        50  3.47 0.399  8.69   2.3  3.06  3.46  4.27  4.4    4.4   4.4
#> 2 versicolor    50  2.80 0.310  9.04   2    2.3   2.86  3.20  3.37   3.4   3.4
#> 3 virginica     50  3.00 0.320  9.38   2.2  2.5   3     3.6   3.8    3.8   3.8

Dynamic variables names in dplyr function across multiple columns

We could use .names in across to rename

mean_fun_multicols <- function(data, group_cols, summary_cols) {
  data %>%
    group_by(across({{group_cols}})) %>%
     summarise(across({{ summary_cols }},
         ~ mean(., na.rm = TRUE), .names = "mean_{.col}"), .groups = "drop")
}

-testing

mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))
# A tibble: 8 × 4
    cyl  gear mean_mpg mean_wt
  <dbl> <dbl>    <dbl>   <dbl>
1     4     3     21.5    2.46
2     4     4     26.9    2.38
3     4     5     28.2    1.83
4     6     3     19.8    3.34
5     6     4     19.8    3.09
6     6     5     19.7    2.77
7     8     3     15.0    4.10
8     8     5     15.4    3.37

NOTE: The := is mainly used when there is a single column in tidyverse

If we use the OP's function, we are assigning multiple columns to a single column and this returns a tibble instead of a normal column. We may need to unpack

library(tidyr)
> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% str
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
grouped_df [8 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ cyl            : num [1:8] 4 4 4 6 6 6 8 8
 $ gear           : num [1:8] 3 4 5 3 4 5 3 5
 $ mean_c(mpg, wt): tibble [8 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ mpg: num [1:8] 21.5 26.9 28.2 19.8 19.8 ...
  ..$ wt : num [1:8] 2.46 2.38 1.83 3.34 3.09 ...
 - attr(*, "groups")= tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ cyl  : num [1:3] 4 6 8
  ..$ .rows: list<int> [1:3] 
  .. ..$ : int [1:3] 1 2 3
  .. ..$ : int [1:3] 4 5 6
  .. ..$ : int [1:2] 7 8
  .. ..@ ptype: int(0) 
  ..- attr(*, ".drop")= logi TRUE

> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% 
        unpack(where(is_tibble))
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
# A tibble: 8 × 4
# Groups:   cyl [3]
    cyl  gear   mpg    wt
  <dbl> <dbl> <dbl> <dbl>
1     4     3  21.5  2.46
2     4     4  26.9  2.38
3     4     5  28.2  1.83
4     6     3  19.8  3.34
5     6     4  19.8  3.09
6     6     5  19.7  2.77
7     8     3  15.0  4.10
8     8     5  15.4  3.37

standard evaluation in dplyr: summarise a variable given as a character string

dplyr 1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr programming vignette here:

https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

The new way to refer to columns when their identifier is stored as a character vector is to use the .data pronoun from rlang, and then subset as you would in base R.

library(dplyr)

key <- "v3"
val <- "v2"
drp <- "v1"

df <- tibble(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))

df %>% 
    select(-matches(drp)) %>% 
    group_by(.data[[key]]) %>% 
    summarise(total = sum(.data[[val]], na.rm = TRUE))

#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 2
#>   v3    total
#>   <chr> <int>
#> 1 A        21
#> 2 B        19

If your code is in a package function, you can @importFrom rlang .data to avoid R check notes about undefined globals.