Using Dplyr to Group_By and Conditionally Mutate a Dataframe by Group

Using dplyr to group_by and conditionally mutate only with if (without else) statement

I think you are after this? Using Johnny's method. You shouldn't hit an error when you use the original value as part of case_when in the case when the sum is not greater than the cutoff...

iris_mutated <- iris %>% 
  group_by(Species) %>% 
  mutate(Sepal.Length = case_when(sum(Sepal.Length) > 250 ~ sum(Sepal.Length),
                                   T ~ Sepal.Length),
         Sepal.Width = case_when(sum(Sepal.Width) > 170 ~ sum(Sepal.Width),
                                   T ~ Sepal.Width),
         Petal.Length = case_when(sum(Petal.Length) > 70 ~ sum(Petal.Length),
                                   T ~ Petal.Length),
         Petal.Width = case_when(sum(Petal.Width) > 15 ~ sum(Petal.Width),
                                   T ~ Petal.Width))

R dplyr conditional mutate with group_by

Instead of ifelse use if/else since all returns output of length 1 and ifelse would return output of same length as the input so it recycles the first element throughout the group.

library(dplyr)

example %>% 
  group_by(code) %>%
  mutate(B = if(all(is.na(B))) A else B))

#   code    A     B
#  <chr> <dbl> <dbl>
#1 1       0.5  0.7 
#2 1       0.5  0.3 
#3 1       0.5  0.25
#4 2       0.2  0.2 
#5 2       0.8  0.8 
#6 2       0.5  0.5

group_by and conditionally mutate by group

The following creates the dummy as the question defines it.

The comparison dummy$ciiu_comparado == 1 returns FALSE/TRUE, internally coded as 0/1;
sum(<logical>) gets the total 1's;
and n() is the group's number of rows.
Then, check if the result is greater than the threshold value 0.5.

Output ommited.

library(dplyr)

db %>%
  group_by(year, id) %>%
  mutate(goal = sum(dummy$ciiu_comparado == 1)/n(),
         goal = as.integer(goal > 0.5))

The goal can be computed in one instruction.

db %>%
  group_by(year, id) %>%
  mutate(goal = +(sum(dummy$ciiu_comparado)/n() > 0.5))

Mutate a grouped value (like a conditional mean)

Use the group_by before the mutate to create the mean column by group - instead of creating a summarised dataset and then joining to original data

library(dplyr)
mtcars %>% 
   group_by(cyl, carb) %>%
   mutate(var1 = mean(mpg)) %>%
   ungroup %>%
   head

Mutate by group based on a conditional

If we use x$ after the group_by, it returns the entire column instead of only the values in that particular group. Second, TRUE/FALSE is logical vector, so we don't need ==

library(dplyr)
x %>%
     group_by(usernum) %>% 
     mutate(user.mean = mean(time[final]))

The one option where we can use $ is with .data

x %>% 
    group_by(usernum) %>%
    mutate(user.mean = mean(.data$time[.data$final]))

In R, how can I group by one column and conditionally sum another?

Attempt 1 was almost there. It's important that the number of rows is maintained. Replace cost[which(Product == "X")] with cost*(Product=="X") (a dirty trick).
Btw. the which is unnecessary.

The snippet would be:

df %>%
  arrange(Customer, Date) %>%
  group_by(Customer) %>%
  mutate(
    nSubsqX = sum(Product=="X") - cumsum(Product=="X"),
    nCostSubsqX = sum(cost[Product == "X"]) - cumsum(cost*(Product == "X")))

How to mutate and map conditional on values of grouping variables?

You can use the function purrr::map_if() to accomplish this. It takes a predicate function and can perform different functions whether the predicate is TRUE or FALSE, like this:

purrr::map_if(
      .x = data, 
      .p = ~ group2 %in% c("a", "b", "c"),
      .f = ~lm(var1 ~ var2, .x), 
      .else = ~lm(var1 ~ 1, .x)
    )

Full reprex

Here is a reprex based on your data (I add a column to verify that the logic is correct):

library(dplyr, warn.conflicts = FALSE)

tibble(
  group1 = rep(letters[1:10],100),
  group2 = rep(letters[1:10],100),
  var1 = rnorm(1000),
  var2 = rnorm(1000)
) %>% 
  group_by(group1, group2) %>% 
  tidyr::nest() %>% 
  mutate(
    model = purrr::map_if(
      .x = data, 
      .p = ~ group2 %in% c("a", "b", "c"),
      .f = ~lm(var1 ~ var2, .x), 
      .else = ~lm(var1 ~ 1, .x)
    )
  ) %>%
  # Note: I add this column to verify the logic
  mutate(
    formula = purrr::map_chr(.x = model, ~.x$call %>% rlang::as_label())
  )
#> # A tibble: 10 x 5
#> # Groups:   group1, group2 [10]
#>    group1 group2 data               model  formula                             
#>    <chr>  <chr>  <list>             <list> <chr>                               
#>  1 a      a      <tibble [100 x 2]> <lm>   lm(formula = var1 ~ var2, data = .x)
#>  2 b      b      <tibble [100 x 2]> <lm>   lm(formula = var1 ~ var2, data = .x)
#>  3 c      c      <tibble [100 x 2]> <lm>   lm(formula = var1 ~ var2, data = .x)
#>  4 d      d      <tibble [100 x 2]> <lm>   lm(formula = var1 ~ 1, data = .x)   
#>  5 e      e      <tibble [100 x 2]> <lm>   lm(formula = var1 ~ 1, data = .x)   
#>  6 f      f      <tibble [100 x 2]> <lm>   lm(formula = var1 ~ 1, data = .x)   
#>  7 g      g      <tibble [100 x 2]> <lm>   lm(formula = var1 ~ 1, data = .x)   
#>  8 h      h      <tibble [100 x 2]> <lm>   lm(formula = var1 ~ 1, data = .x)   
#>  9 i      i      <tibble [100 x 2]> <lm>   lm(formula = var1 ~ 1, data = .x)   
#> 10 j      j      <tibble [100 x 2]> <lm>   lm(formula = var1 ~ 1, data = .x)

dplyr conditional mean of subset of group

1) There is no filter method for numeric vectors. Subscript Value as shown instead:

library(dplyr)

df %>%
  group_by(Date2) %>%
  mutate(CondGrpMean = mean(Value[Date1 >= Date2-14 & Date1 < Date2])) %>%
  ungroup

giving:

# A tibble: 14 × 4
   Date1      Date2      Value CondGrpMean
   <date>     <date>     <int>       <dbl>
 1 2022-08-01 2022-08-15     1         1.5
 2 2022-08-08 2022-08-15     2         1.5
 3 2022-08-15 2022-08-15     3         1.5
 4 2022-08-22 2022-08-15     4         1.5
 5 2022-08-29 2022-08-15     5         1.5
 6 2022-09-05 2022-08-15     6         1.5
 7 2022-09-12 2022-08-15     7         1.5
 8 2022-08-01 2022-08-29     8        10.5
 9 2022-08-08 2022-08-29     9        10.5
10 2022-08-15 2022-08-29    10        10.5
11 2022-08-22 2022-08-29    11        10.5
12 2022-08-29 2022-08-29    12        10.5
13 2022-09-05 2022-08-29    13        10.5
14 2022-09-12 2022-08-29    14        10.5

1a) A variation of this is:

df %>%
  group_by(Date2) %>%
  mutate(CondGrpMean = mean(Value[c(Date2 - Date1) %in% 1:14])) %>%
  ungroup

2) With base R:

Mean <- function(ix) with(df[ix, ], mean(Value[Date1 >= Date2-14 & Date1 < Date2]))
transform(df, CondGrpMean = ave(1:nrow(df), Date2, FUN = Mean))