dplyr: How to apply do() on result of group_by?
Let us define eaten
like this:
eaten <- data.frame(person, foods, stringsAsFactors = FALSE)
1) Then try this:
eaten %.% group_by(person) %.% do(function(x) combn(x$foods, m = 2))
giving:
[[1]]
[,1] [,2] [,3]
[1,] "apple" "apple" "banana"
[2,] "banana" "cucumber" "cucumber"
[[2]]
[,1] [,2] [,3]
[1,] "spaghetti" "spaghetti" "cucumber"
[2,] "cucumber" "banana" "banana"
2) To be able to do something near to what @Hadley describes in the comments without waiting for a future version of dplyr try this where do2
is found here:
library(gsubfn)
eaten %.% group_by(person) %.% fn$do2(~ combn(.$foods, m = 2))
giving:
$Grace
[,1] [,2] [,3]
[1,] "apple" "apple" "banana"
[2,] "banana" "cucumber" "cucumber"
$Rob
[,1] [,2] [,3]
[1,] "spaghetti" "spaghetti" "cucumber"
[2,] "cucumber" "banana" "banana"
Note: The last line of the question giving the code in the help file also fails for me. This variation of it works for me: do(jan, lm, formula = ArrDelay ~ date)
.
R - use group_by() and mutate() in dplyr to apply function that returns a vector the length of groups
How about making use of nest
instead:
foo %>%
group_by(fac) %>%
nest() %>%
mutate(mahal = map(data, ~mahalanobis(
.x,
center = colMeans(.x, na.rm = T),
cov = cov(.x, use = "pairwise.complete.obs")))) %>%
unnest()
## A tibble: 10 x 4
# fac mahal x y
# <fct> <dbl> <dbl> <dbl>
# 1 A 1.02 -6.26 15.1
# 2 A 0.120 1.84 3.90
# 3 A 2.81 -8.36 -6.21
# 4 A 2.84 16.0 -22.1
# 5 A 1.21 3.30 11.2
# 6 B 2.15 -8.20 -0.449
# 7 B 2.86 4.87 -0.162
# 8 B 1.23 7.38 9.44
# 9 B 0.675 5.76 8.21
#10 B 1.08 -3.05 5.94
Here you avoid an explicit "x"
, "y"
filter of the form temp <- x[, c("x", "y")]
, as you nest
relevant columns after grouping by fac
. Applying mahalanobis
is then straight-forward.
Update
To respond to your comment, here is a purrr
option. Since it's easy to loose track of what's going on, let's go step-by-step:
Generate sample data with one additional column.
set.seed(1)
foo <- data.frame(
x = rnorm(10, 0, 10),
y = rnorm(10, 0, 10),
z = rnorm(10, 0, 10),
fac = c(rep("A", 5), rep("B", 5)))We now store the columns which define the subset of the data to be used for the calculation of the Mahalanobis distance in a
list
cols <- list(cols1 = c("x", "y"), cols2 = c("y", "z"))
So we will calculate the Mahalanobis distance (per
fac
) for the subset of data in columnsx
+y
and then separately fory
+z
. The names ofcols
will be used as the column names of the two distance vectors.Now for the actual
purrr
command:imap_dfc(cols, ~nest(foo %>% group_by(fac), .x, .key = !!.y) %>% select(!!.y)) %>%
mutate_all(function(lst) map(lst, ~mahalanobis(
.x,
center = colMeans(.x, na.rm = T),
cov = cov(., use = "pairwise.complete.obs")))) %>%
unnest() %>%
bind_cols(foo, .)
# x y z fac cols1 cols2
#1 -6.264538 15.1178117 9.1897737 A 1.0197542 1.3608052
#2 1.836433 3.8984324 7.8213630 A 0.1199607 1.1141352
#3 -8.356286 -6.2124058 0.7456498 A 2.8059562 1.5099574
#4 15.952808 -22.1469989 -19.8935170 A 2.8401953 3.0675228
#5 3.295078 11.2493092 6.1982575 A 1.2141337 0.9475794
#6 -8.204684 -0.4493361 -0.5612874 B 2.1517055 1.2284793
#7 4.874291 -0.1619026 -1.5579551 B 2.8626501 1.1724828
#8 7.383247 9.4383621 -14.7075238 B 1.2271316 2.5723023
#9 5.757814 8.2122120 -4.7815006 B 0.6746788 0.6939081
#10 -3.053884 5.9390132 4.1794156 B 1.0838341 2.3328276In short, we
- loop over entries in
cols
, nest
data infoo
perfac
based on columns defined incols
,- apply
mahalanobis
on the nested and grouped data generating as many distance columns with nested data as we have entries incols
(i.e. subsets), and - finally
unnest
the distance data and column-bind it to the originalfoo
data.
- loop over entries in
using dplyr::group_by in a function within apply
You should apply using the colnames(dat)
to get the correct groupings:
dat <- mtcars[c(2:4,11)]
grp <- function(x) {
group_by(dat,!!as.name(x)) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
lapply(colnames(dat), grp)
How to use group_by with mean and sum in dplyr?
If I understood correctly, this might help you
#Libraries
library(tidyverse)
library(lubridate)
#Data
df <-
tibble::tribble(
~Year, ~School.Name, ~Student.Score1, ~Student.Score2,
2019L, "ISD 1", 1L, NA,
2020L, "ISD 4", 4L, 2L,
2020L, "ISD 3", NA, 3L,
2018L, "ISD 1", 4L, NA,
2019L, "ISD 4", 2L, 5L,
2020L, "ISD 4", 3L, 2L,
2019L, "ISD 3", NA, 1L,
2018L, "ISD 1", 2L, 4L
)
#How to
df %>%
group_by(Year,School.Name) %>%
summarise(
n = n(),
across(.cols = contains(".Score"),.fns = function(x)mean(x,na.rm = TRUE))
)
# A tibble: 6 x 5
# Groups: Year [3]
Year School.Name n Student.Score1 Student.Score2
<int> <chr> <int> <dbl> <dbl>
1 2018 ISD 1 2 3 4
2 2019 ISD 1 1 1 NaN
3 2019 ISD 3 1 NaN 1
4 2019 ISD 4 1 2 5
5 2020 ISD 3 1 NaN 3
6 2020 ISD 4 2 3.5 2
How to use dplyr::group_by in a function
You can use group_by_at
and column index such as:
countString <- function(things) {
index <- which(colnames(theTibble) %in% things)
theTibble %>%
group_by_at(index) %>%
count()
}
countString(c("animal", "sex"))
## A tibble: 4 x 3
## Groups: animal, sex [4]
# animal sex nn
# <chr> <chr> <int>
#1 cat f 2
#2 dog f 1
#3 dog m 2
#4 fish unknown 1
applying a function to the output of dplyr's group_by
Well, you have a parenthesis problem and a file naming problem so maybe it's one of those that you are referring to. I'm assuming
iris %>%
group_by(Species) %>%
do({
p <- ggplot(., aes(x=Sepal.Length, y=Sepal.Width)) + geom_point()
ggsave(p, filename=paste0(unique(.$Species),".pdf"))
})
would fix your problem.
R, dplyr - combination of group_by() and arrange() does not produce expected result?
I think you want
ToothGrowth %>%
arrange(supp,len)
The chaining system just replaces nested commands, so first you are grouping, then ordering that grouped result, which breaks the original ordering.
How to apply a function per group in dplyr without having to define a function?
You can define the correct order, use match
to get position of v2
and diff
to calculate the difference of their occurrence in each v1
. Make res
as TRUE
if the order matches.
library(dplyr)
correct_order = c('X', 'Y')
d %>%
group_by(v1) %>%
summarise(res = all(diff(match(correct_order, v2)) > 0))
# v1 res
# <chr> <lgl>
#1 a TRUE
#2 b FALSE
dplyr summarise : Group by multiple variables in a loop and add results in the same dataframe
library(questionr)
library(tidyverse)
data(hdv2003)
list("trav.satisf", "cuisine", "sexe") %>%
map(~ {
hdv2003 %>%
group_by_at(.x) %>%
summarise(
n = n(),
percent = round((n() / nrow(hdv2003)) * 100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE) / sum(!is.na(sexe))) * 100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
) %>%
rename_at(1, ~"group") %>%
mutate(grouping = .x)
}) %>%
bind_rows() %>%
select(grouping, group, everything())
#> # A tibble: 8 x 6
#> grouping group n percent femmes age
#> <chr> <fct> <int> <dbl> <dbl> <dbl>
#> 1 trav.satisf Satisfaction 480 24 51.5 41.4
#> 2 trav.satisf Insatisfaction 117 5.9 47.9 40.3
#> 3 trav.satisf Equilibre 451 22.6 49.9 40.9
#> 4 trav.satisf <NA> 952 47.6 60.2 56
#> 5 cuisine Non 1119 56 43.8 50.1
#> 6 cuisine Oui 881 44 69.4 45.6
#> 7 sexe Homme 899 45 0 48.2
#> 8 sexe Femme 1101 55 100 48.2
Created on 2021-11-12 by the reprex package (v2.0.1)
Related Topics
Error in Get(As.Character(Fun), Mode = "Function", Envir = Envir)
How to Create a Variable of Rownames
Pivot_Wider, Count Number of Occurrences
Update Rows of Data Frame in R
Trouble Installing and Loading Rjava on MAC El Capitan
How to Merge Two Data Frames in R by a Common Column with Mismatched Date/Time Values
Return a List in Dplyr Mutate()
Identify Consecutive Sequences Based on a Given Variable
Heat Map Per Column with Ggplot2
R Ggplot2: Labeling a Horizontal Line Without Associating the Label with a Series
Extract Name of Data.Frame in R as Character
Efficient Multiplication of Columns in a Data Frame
Remove Duplicate Rows from Xts Object
Filling Bars in Barplot with Textiles in Ggplot2
Predict() with Arbitrary Coefficients in R
Creating a Specific Sequence of Date/Times in R
Str_Replace (Package Stringr) Cannot Replace Brackets in R
Str_Extract_All: Return All Patterns Found in String Concatenated as Vector