Using Dplyr to Group_By and Conditionally Mutate a Dataframe by Group

Using dplyr to group_by and conditionally mutate only with if (without else) statement

I think you are after this? Using Johnny's method. You shouldn't hit an error when you use the original value as part of case_when in the case when the sum is not greater than the cutoff...

iris_mutated <- iris %>% 
group_by(Species) %>%
mutate(Sepal.Length = case_when(sum(Sepal.Length) > 250 ~ sum(Sepal.Length),
T ~ Sepal.Length),
Sepal.Width = case_when(sum(Sepal.Width) > 170 ~ sum(Sepal.Width),
T ~ Sepal.Width),
Petal.Length = case_when(sum(Petal.Length) > 70 ~ sum(Petal.Length),
T ~ Petal.Length),
Petal.Width = case_when(sum(Petal.Width) > 15 ~ sum(Petal.Width),
T ~ Petal.Width))

R dplyr conditional mutate with group_by

Instead of ifelse use if/else since all returns output of length 1 and ifelse would return output of same length as the input so it recycles the first element throughout the group.

library(dplyr)

example %>%
group_by(code) %>%
mutate(B = if(all(is.na(B))) A else B))

# code A B
# <chr> <dbl> <dbl>
#1 1 0.5 0.7
#2 1 0.5 0.3
#3 1 0.5 0.25
#4 2 0.2 0.2
#5 2 0.8 0.8
#6 2 0.5 0.5

group_by and conditionally mutate by group

The following creates the dummy as the question defines it.

  1. The comparison dummy$ciiu_comparado == 1 returns FALSE/TRUE, internally coded as 0/1;
  2. sum(<logical>) gets the total 1's;
  3. and n() is the group's number of rows.
  4. Then, check if the result is greater than the threshold value 0.5.

Output ommited.

library(dplyr)

db %>%
group_by(year, id) %>%
mutate(goal = sum(dummy$ciiu_comparado == 1)/n(),
goal = as.integer(goal > 0.5))

The goal can be computed in one instruction.

db %>%
group_by(year, id) %>%
mutate(goal = +(sum(dummy$ciiu_comparado)/n() > 0.5))

Mutate a grouped value (like a conditional mean)

Use the group_by before the mutate to create the mean column by group - instead of creating a summarised dataset and then joining to original data

library(dplyr)
mtcars %>%
group_by(cyl, carb) %>%
mutate(var1 = mean(mpg)) %>%
ungroup %>%
head

Mutate by group based on a conditional

If we use x$ after the group_by, it returns the entire column instead of only the values in that particular group. Second, TRUE/FALSE is logical vector, so we don't need ==

library(dplyr)
x %>%
group_by(usernum) %>%
mutate(user.mean = mean(time[final]))

The one option where we can use $ is with .data

x %>% 
group_by(usernum) %>%
mutate(user.mean = mean(.data$time[.data$final]))

In R, how can I group by one column and conditionally sum another?

Attempt 1 was almost there. It's important that the number of rows is maintained. Replace cost[which(Product == "X")] with cost*(Product=="X") (a dirty trick).
Btw. the which is unnecessary.

The snippet would be:

df %>%
arrange(Customer, Date) %>%
group_by(Customer) %>%
mutate(
nSubsqX = sum(Product=="X") - cumsum(Product=="X"),
nCostSubsqX = sum(cost[Product == "X"]) - cumsum(cost*(Product == "X")))

How to mutate and map conditional on values of grouping variables?

You can use the function purrr::map_if() to accomplish this. It takes a predicate function and can perform different functions whether the predicate is TRUE or FALSE, like this:

purrr::map_if(
.x = data,
.p = ~ group2 %in% c("a", "b", "c"),
.f = ~lm(var1 ~ var2, .x),
.else = ~lm(var1 ~ 1, .x)
)

Full reprex

Here is a reprex based on your data (I add a column to verify that the logic is correct):

library(dplyr, warn.conflicts = FALSE)

tibble(
group1 = rep(letters[1:10],100),
group2 = rep(letters[1:10],100),
var1 = rnorm(1000),
var2 = rnorm(1000)
) %>%
group_by(group1, group2) %>%
tidyr::nest() %>%
mutate(
model = purrr::map_if(
.x = data,
.p = ~ group2 %in% c("a", "b", "c"),
.f = ~lm(var1 ~ var2, .x),
.else = ~lm(var1 ~ 1, .x)
)
) %>%
# Note: I add this column to verify the logic
mutate(
formula = purrr::map_chr(.x = model, ~.x$call %>% rlang::as_label())
)
#> # A tibble: 10 x 5
#> # Groups: group1, group2 [10]
#> group1 group2 data model formula
#> <chr> <chr> <list> <list> <chr>
#> 1 a a <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 2 b b <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 3 c c <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 4 d d <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 5 e e <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 6 f f <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 7 g g <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 8 h h <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 9 i i <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 10 j j <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)

dplyr conditional mean of subset of group

1) There is no filter method for numeric vectors. Subscript Value as shown instead:

library(dplyr)

df %>%
group_by(Date2) %>%
mutate(CondGrpMean = mean(Value[Date1 >= Date2-14 & Date1 < Date2])) %>%
ungroup

giving:

# A tibble: 14 × 4
Date1 Date2 Value CondGrpMean
<date> <date> <int> <dbl>
1 2022-08-01 2022-08-15 1 1.5
2 2022-08-08 2022-08-15 2 1.5
3 2022-08-15 2022-08-15 3 1.5
4 2022-08-22 2022-08-15 4 1.5
5 2022-08-29 2022-08-15 5 1.5
6 2022-09-05 2022-08-15 6 1.5
7 2022-09-12 2022-08-15 7 1.5
8 2022-08-01 2022-08-29 8 10.5
9 2022-08-08 2022-08-29 9 10.5
10 2022-08-15 2022-08-29 10 10.5
11 2022-08-22 2022-08-29 11 10.5
12 2022-08-29 2022-08-29 12 10.5
13 2022-09-05 2022-08-29 13 10.5
14 2022-09-12 2022-08-29 14 10.5

1a) A variation of this is:

df %>%
group_by(Date2) %>%
mutate(CondGrpMean = mean(Value[c(Date2 - Date1) %in% 1:14])) %>%
ungroup

2) With base R:

Mean <- function(ix) with(df[ix, ], mean(Value[Date1 >= Date2-14 & Date1 < Date2]))
transform(df, CondGrpMean = ave(1:nrow(df), Date2, FUN = Mean))


Related Topics



Leave a reply



Submit