Using The Result of Summarise (Dplyr) to Mutate The Original Dataframe

Using the result of summarise (dplyr) to mutate the original dataframe

As @beetroot points out in the comments, you can accomplish this with a join:

limits = span %>% 
group_by(YEAR) %>%
summarise(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS])) %>%
inner_join(span, by='YEAR')
# YEAR minDOY maxDOY date DOY DLS
# 1 2000 93 303 2000-01-01 00:00:00 1 FALSE
# 2 2000 93 303 2000-01-01 01:00:00 1 FALSE
# 3 2000 93 303 2000-01-01 02:00:00 1 FALSE
# 4 2000 93 303 2000-01-01 03:00:00 1 FALSE
# 5 2000 93 303 2000-01-01 04:00:00 1 FALSE
# 6 2000 93 303 2000-01-01 05:00:00 1 FALSE
# 7 2000 93 303 2000-01-01 06:00:00 1 FALSE
# 8 2000 93 303 2000-01-01 07:00:00 1 FALSE
# 9 2000 93 303 2000-01-01 08:00:00 1 FALSE
# 10 2000 93 303 2000-01-01 09:00:00 1 FALSE

Use of mutate in Summarise function using R

I find the use of mutate inside summarize very confusing, and don't really know what to expect of it (I'm honestly surprised it even works). If I understand correctly, what you want to do is best expressed as (Scenario - 3):

data %>%
group_by(identifier) %>%
summarize(shift_back_max = - min(shift_back_max, na.rm = TRUE),
shift_forward_max = min(shift_forward_max, na.rm = TRUE)) %>%
ungroup() %>%
mutate(across(starts_with("shift"), ~ ifelse(is.infinite(.x), 30 * sign(.x), .x))))

(meaning you first summarize by identifier, then you apply a treatment to the whole result)

You can compare results of the different approaches with all.equal(). I'd expect all these approaches to give the same result, but not to be as clear to the reader.

After summarize, reinsert calculated values into original dataframe dplyr

If you are just wanting to get the mean of any group that has more than 1 row, then you don't need to separate out, since nothing will happen to just one row in a group. Here, I add max for variable_2, so that it only returns one value and so it is retained in the output.

library(tidyverse)

df %>%
group_by(id,variable_1,place) %>%
dplyr::summarise(value = mean(value), variable_2 = max(variable_2))

Output

  id    variable_1 place     value variable_2
<chr> <chr> <chr> <dbl> <chr>
1 01_01 a Australia 0.6 cat
2 01_02 a France 0.8 pig
3 01_03 a Belguim 0.2 dog
4 01_04 a Germany 1.7 chicken

Or if you do want to have it broken up, then you can just add an additional summary for variable_2, so that it doesn't get removed.

df2 <- df %>% 
group_by(id,variable_1,place) %>%
filter(n()==2) %>%
dplyr::summarise(value = mean(value), variable_2 = max(variable_2))

df <- df %>%
group_by(id,variable_1,place) %>%
filter(n()==1) %>%
bind_rows(., df2)

Mutate a grouped value (like a conditional mean)

Use the group_by before the mutate to create the mean column by group - instead of creating a summarised dataset and then joining to original data

library(dplyr)
mtcars %>%
group_by(cyl, carb) %>%
mutate(var1 = mean(mpg)) %>%
ungroup %>%
head

Create new column for mean by group in original dataframe in R

We can use mutate instead of summarise

library(dplyr)
df <- df %>%
group_by(unit_id) %>%
mutate(mean = mean(outcome))

Adding Summarized Fields to Data Frame R

You may combine the two summary outputs.

library(dplyr)

bind_rows(df %>%
group_by(Description)%>%
summarise(Amt=sum(Amount)),
df %>%
group_by(Category)%>%
summarise(Amt=sum(Amount)) %>%
rename(Description = Category)) %>%
arrange(Description)

# Description Amt
# <chr> <dbl>
# 1 A 4700
# 2 A.a 900
# 3 A.b 1200
# 4 A.c 2600
# 5 B 7400
# 6 B.a 3500
# 7 B.b 3000
# 8 B.c 400
# 9 C 1220
#10 C.a 1580
#11 C.b 50
#12 C.c 90

How to change an element of the original dataframe using dplyr

Using the function case_when:

library('tibble')

df <- tibble(
ticker = c("first", "second", "third"),
status = c(T,T,T)
)

library(tidyverse)

df %>%
mutate(status = case_when(
ticker == "first" ~ F,
TRUE ~ T
))

This is the output:

# A tibble: 3 x 3
ticker status `case_when(ticker == "first" ~ F, TRUE ~ T)`
<chr> <lgl> <lgl>
1 first TRUE FALSE
2 second TRUE TRUE
3 third TRUE TRUE


Related Topics



Leave a reply



Submit