Using dplyr to group_by and conditionally mutate only with if (without else) statement
I think you are after this? Using Johnny's method. You shouldn't hit an error when you use the original value as part of case_when in the case when the sum is not greater than the cutoff...
iris_mutated <- iris %>%
group_by(Species) %>%
mutate(Sepal.Length = case_when(sum(Sepal.Length) > 250 ~ sum(Sepal.Length),
T ~ Sepal.Length),
Sepal.Width = case_when(sum(Sepal.Width) > 170 ~ sum(Sepal.Width),
T ~ Sepal.Width),
Petal.Length = case_when(sum(Petal.Length) > 70 ~ sum(Petal.Length),
T ~ Petal.Length),
Petal.Width = case_when(sum(Petal.Width) > 15 ~ sum(Petal.Width),
T ~ Petal.Width))
R dplyr conditional mutate with group_by
Instead of ifelse
use if
/else
since all
returns output of length 1 and ifelse
would return output of same length as the input so it recycles the first element throughout the group.
library(dplyr)
example %>%
group_by(code) %>%
mutate(B = if(all(is.na(B))) A else B))
# code A B
# <chr> <dbl> <dbl>
#1 1 0.5 0.7
#2 1 0.5 0.3
#3 1 0.5 0.25
#4 2 0.2 0.2
#5 2 0.8 0.8
#6 2 0.5 0.5
group_by and conditionally mutate by group
The following creates the dummy as the question defines it.
- The comparison
dummy$ciiu_comparado == 1
returnsFALSE/TRUE
, internally coded as0/1
; sum(<logical>)
gets the total1
's;- and
n()
is the group's number of rows. - Then, check if the result is greater than the threshold value
0.5
.
Output ommited.
library(dplyr)
db %>%
group_by(year, id) %>%
mutate(goal = sum(dummy$ciiu_comparado == 1)/n(),
goal = as.integer(goal > 0.5))
The goal
can be computed in one instruction.
db %>%
group_by(year, id) %>%
mutate(goal = +(sum(dummy$ciiu_comparado)/n() > 0.5))
Mutate a grouped value (like a conditional mean)
Use the group_by
before the mutate
to create the mean
column by group - instead of creating a summarise
d dataset and then joining to original data
library(dplyr)
mtcars %>%
group_by(cyl, carb) %>%
mutate(var1 = mean(mpg)) %>%
ungroup %>%
head
Mutate by group based on a conditional
If we use x$
after the group_by
, it returns the entire column instead of only the values in that particular group. Second, TRUE/FALSE
is logical vector, so we don't need ==
library(dplyr)
x %>%
group_by(usernum) %>%
mutate(user.mean = mean(time[final]))
The one option where we can use $
is with .data
x %>%
group_by(usernum) %>%
mutate(user.mean = mean(.data$time[.data$final]))
In R, how can I group by one column and conditionally sum another?
Attempt 1 was almost there. It's important that the number of rows is maintained. Replace cost[which(Product == "X")]
with cost*(Product=="X")
(a dirty trick).
Btw. the which
is unnecessary.
The snippet would be:
df %>%
arrange(Customer, Date) %>%
group_by(Customer) %>%
mutate(
nSubsqX = sum(Product=="X") - cumsum(Product=="X"),
nCostSubsqX = sum(cost[Product == "X"]) - cumsum(cost*(Product == "X")))
How to mutate and map conditional on values of grouping variables?
You can use the function purrr::map_if()
to accomplish this. It takes a predicate function and can perform different functions whether the predicate is TRUE or FALSE, like this:
purrr::map_if(
.x = data,
.p = ~ group2 %in% c("a", "b", "c"),
.f = ~lm(var1 ~ var2, .x),
.else = ~lm(var1 ~ 1, .x)
)
Full reprex
Here is a reprex based on your data (I add a column to verify that the logic is correct):
library(dplyr, warn.conflicts = FALSE)
tibble(
group1 = rep(letters[1:10],100),
group2 = rep(letters[1:10],100),
var1 = rnorm(1000),
var2 = rnorm(1000)
) %>%
group_by(group1, group2) %>%
tidyr::nest() %>%
mutate(
model = purrr::map_if(
.x = data,
.p = ~ group2 %in% c("a", "b", "c"),
.f = ~lm(var1 ~ var2, .x),
.else = ~lm(var1 ~ 1, .x)
)
) %>%
# Note: I add this column to verify the logic
mutate(
formula = purrr::map_chr(.x = model, ~.x$call %>% rlang::as_label())
)
#> # A tibble: 10 x 5
#> # Groups: group1, group2 [10]
#> group1 group2 data model formula
#> <chr> <chr> <list> <list> <chr>
#> 1 a a <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 2 b b <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 3 c c <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 4 d d <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 5 e e <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 6 f f <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 7 g g <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 8 h h <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 9 i i <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 10 j j <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
dplyr conditional mean of subset of group
1) There is no filter
method for numeric vectors. Subscript Value
as shown instead:
library(dplyr)
df %>%
group_by(Date2) %>%
mutate(CondGrpMean = mean(Value[Date1 >= Date2-14 & Date1 < Date2])) %>%
ungroup
giving:
# A tibble: 14 × 4
Date1 Date2 Value CondGrpMean
<date> <date> <int> <dbl>
1 2022-08-01 2022-08-15 1 1.5
2 2022-08-08 2022-08-15 2 1.5
3 2022-08-15 2022-08-15 3 1.5
4 2022-08-22 2022-08-15 4 1.5
5 2022-08-29 2022-08-15 5 1.5
6 2022-09-05 2022-08-15 6 1.5
7 2022-09-12 2022-08-15 7 1.5
8 2022-08-01 2022-08-29 8 10.5
9 2022-08-08 2022-08-29 9 10.5
10 2022-08-15 2022-08-29 10 10.5
11 2022-08-22 2022-08-29 11 10.5
12 2022-08-29 2022-08-29 12 10.5
13 2022-09-05 2022-08-29 13 10.5
14 2022-09-12 2022-08-29 14 10.5
1a) A variation of this is:
df %>%
group_by(Date2) %>%
mutate(CondGrpMean = mean(Value[c(Date2 - Date1) %in% 1:14])) %>%
ungroup
2) With base R:
Mean <- function(ix) with(df[ix, ], mean(Value[Date1 >= Date2-14 & Date1 < Date2]))
transform(df, CondGrpMean = ave(1:nrow(df), Date2, FUN = Mean))
Related Topics
Fastest Way to Parse a Date-Time String to Class Date
Remove Blank Lines from Plot Geom_Tile Ggplot
Is There More Efficient or Concise Way to Use Tidyr::Gather to Make My Data Look 'Tidy'
How to Set R to Default Options
Splitting (1:N)[Boolean] into Contiguous Sequences
Using Mutate Rowwise Over a Subset of Columns
How to Set Contrasts for My Variable in Regression Analysis with R
How to Fuzzy Join Based on Multiple Columns and Conditions
Change Plot Panel in Multipanel Plot in R
Include Non-Cran Package in Cran Package
R Package Conflict Between Gam and Mgcv
Robust Standard Errors for Mixed-Effects Models in Lme4 Package of R
Change The Year in a Datetime Object in R
Column Name with Brackets or Other Punctuations for Dplyr Group_By
How to Round All Values in a Matrix
Small Ggplot Object (1 Mb) Turns into 7 Gigabyte .Rdata Object When Saved