dplyr group by colnames described as vector of strings
You can use group_by_at
, where you can pass a character vector of column names as group variables:
mtcars %>%
filter(disp < 160) %>%
group_by_at(cols) %>%
summarise(n = n())
# A tibble: 12 x 8
# Groups: mpg, cyl, disp, drat, qsec, gear [?]
# mpg cyl disp drat qsec gear carb n
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 19.7 6 145.0 3.62 15.50 5 6 1
# 2 21.4 4 121.0 4.11 18.60 4 2 1
# 3 21.5 4 120.1 3.70 20.01 3 1 1
# 4 22.8 4 108.0 3.85 18.61 4 1 1
# ...
Or you can move the column selection inside group_by_at
using vars
and column select helper functions:
mtcars %>%
filter(disp < 160) %>%
group_by_at(vars(matches('[a-z]{3,}$'))) %>%
summarise(n = n())
# A tibble: 12 x 8
# Groups: mpg, cyl, disp, drat, qsec, gear [?]
# mpg cyl disp drat qsec gear carb n
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 19.7 6 145.0 3.62 15.50 5 6 1
# 2 21.4 4 121.0 4.11 18.60 4 2 1
# 3 21.5 4 120.1 3.70 20.01 3 1 1
# 4 22.8 4 108.0 3.85 18.61 4 1 1
# ...
dplyr group_by vector of column names?
We can use group_by
with across
from dplyr
version >= 1.0.0
library(dplyr)
mtcars %>%
group_by(across(all_of(c('mpg', 'cyl')))) %>%
tally() %>%
head(2)
# A tibble: 2 x 3
# Groups: mpg [2]
# mpg cyl n
# <dbl> <dbl> <int>
#1 10.4 8 2
#2 13.3 8 1
With older versions, use the group_by_at
mtcars %>%
group_by_at(c('mpg', 'cyl')) %>%
tally() %>%
head(2)
# A tibble: 2 x 3
# Groups: mpg [2]
# mpg cyl n
# <dbl> <dbl> <int>
#1 10.4 8 2
#2 13.3 8 1
Group by multiple columns in dplyr, using string vector input
Since this question was posted, dplyr added scoped versions of group_by
(documentation here). This lets you use the same functions you would use with select
, like so:
data = data.frame(
asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE),
a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
# get the columns we want to average within
columns = names(data)[-3]
library(dplyr)
df1 <- data %>%
group_by_at(vars(one_of(columns))) %>%
summarize(Value = mean(value))
#compare plyr for reference
df2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value))
table(df1 == df2, useNA = 'ifany')
## TRUE
## 27
The output from your example question is as expected (see comparison to plyr above and output below):
# A tibble: 9 x 3
# Groups: asihckhdoydkhxiydfgfTgdsx [?]
asihckhdoydkhxiydfgfTgdsx a30mvxigxkghc5cdsvxvyv0ja Value
<fctr> <fctr> <dbl>
1 A A 0.04095002
2 A B 0.24943935
3 A C -0.25783892
4 B A 0.15161805
5 B B 0.27189974
6 B C 0.20858897
7 C A 0.19502221
8 C B 0.56837548
9 C C -0.22682998
Note that since dplyr::summarize
only strips off one layer of grouping at a time, you've still got some grouping going on in the resultant tibble (which can sometime catch people by suprise later down the line). If you want to be absolutely safe from unexpected grouping behavior, you can always add %>% ungroup
to your pipeline after you summarize.
Pass column names as strings to group_by and summarize
For this you can now use _at
versions of the verbs
df %>%
group_by_at(cols2group) %>%
summarize_at(.vars = col2summarize, .funs = min)
Edit (2021-06-09):
Please see Ronak Shah's answer, using
mutate(across(all_of(cols2summarize), min))
Now the preferred option
How to use vector of column names as input into dplyr::group_by()?
You need to use the unquote-splice operator !!!
:
aggregate <- function(df, by) {
df %>% group_by(!!!syms(by)) %>% summarize(a = mean(a))
}
group_key <- c("g1", "g2")
aggregate(df, by = group_key)
## A tibble: 4 x 3
## Groups: g1 [2]
# g1 g2 a
# <dbl> <dbl> <dbl>
#1 1 1 1
#2 1 2 4
#3 2 1 2.5
#4 2 2 5
Use vector of columns in custom dplyr function
You don't necessarily need the function, as you can just mutate
across
the columns and get sums for each category.
library(tidyverse)
dat %>%
group_by(category) %>%
mutate(across(ends_with("take"), .fns = list(count = ~sum(. == "yes"))))
Or if you have a long list, then you can use vars
directly in the across
statement:
vars <- c("intake", "outtake", "pretake")
dat %>%
group_by(category) %>%
mutate(across(vars, .fns = list(count = ~sum(. == "yes"))))
Output
category intake outtake pretake intake_count outtake_count pretake_count
<chr> <fct> <fct> <fct> <int> <int> <int>
1 a no yes no 0 2 0
2 b no yes yes 0 1 2
3 c no yes no 1 1 0
4 d no yes yes 1 1 2
5 e no yes no 1 1 0
6 f no yes yes 1 1 2
7 g no yes no 1 1 0
8 h no yes yes 1 1 2
9 i no yes no 1 1 0
10 j no yes yes 1 1 2
11 a no yes no 0 2 0
12 b no no yes 0 1 2
13 c yes no no 1 1 0
14 d yes no yes 1 1 2
15 e yes no no 1 1 0
16 f yes no yes 1 1 2
17 g yes no no 1 1 0
18 h yes no yes 1 1 2
19 i yes no no 1 1 0
20 j yes no yes 1 1 2
group_by by a vector of characters using tidy evaluation semantics
There is group_by_at
variant of group_by
:
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_at(group_by) %>% summarise(gear = mean(gear))
Above it's simplified version of generalized:
mtcars %>% group_by_at(vars(one_of(group_by))) %>% summarise(gear = mean(gear))
inside vars
you could use any dplyr
way of select variables:
mtcars %>%
group_by_at(vars(
one_of(group_by) # columns from predefined set
,starts_with("a") # add ones started with a
,-hp # but omit that one
,vs # this should be always include
,contains("_gr_") # and ones with string _gr_
)) %>%
summarise(gear = mean(gear))
How to group according to position in a vector using dplyr
Here is an idea,
library(dplyr)
mywords %>%
group_by(grp = rep(seq(n()/10), each = 10)) %>%
count(TheTerms)
which gives,
A tibble: 4,500 x 3
# Groups: grp [1,000]
grp TheTerms n
<int> <fctr> <int>
1 1 DD 3
2 1 HG 4
3 1 POS 3
4 2 DD 1
5 2 HG 1
6 2 KKL 3
7 2 NNTD 4
8 2 POS 1
9 3 HG 1
10 3 KKL 3
# ... with 4,490 more rows
How to check if a vector contained in list column of a data frame with dplyr
Do you want to check for any value in wanted_status
or all of them? The expected output suggests all.
library(dplyr)
wanted_status <- c("x+", "y-")
dat %>%
group_by(cell) %>%
summarise(contained = if(all(wanted_status %in% status)) 'in' else 'out')
# cell contained
# <chr> <chr>
#1 A in
#2 B out
Related Topics
Warning in Install.Packages: Unable to Move Temporary Installation
R: What's the How to Overwrite a Function from a Package
Time Series Plot with X Axis in "Year"-"Month" in R
R Optimization with Equality and Inequality Constraints
How to Pass "Nothing" as an Argument to '[' for Subsetting
Reshape Multi Id Repeated Variable Readings from Long to Wide
Replace Value with the Name of Its Respective Column
Loess Regression on Each Group with Dplyr::Group_By()
How to Merge Two Data Frames on Common Columns in R with Sum of Others
Use an Image as Area Fill in an R Plot
How to Reorder the Items in a Legend
Add Regression Plane to 3D Scatter Plot in Plotly
Remove Part of a String in Dataframe Column (R)
Changing Format of Some Axis Labels in Ggplot2 According to Condition