Dplyr Broadcasting Single Value Per Group in Mutate

dplyr broadcasting single value per group in mutate

Do this:

data %>% group_by(category) %>%
  mutate(value2 = value[year == 2000])

You could also do it this way:

data %>% group_by(category) %>%
  arrange(year) %>%
  mutate(value2 = value[1])

data %>% group_by(category) %>%
  arrange(year) %>%
  mutate(value2 = first(value))

data %>% group_by(category) %>%
  mutate(value2 = nth(value, n = 1, order_by = "year"))

or probably several other ways.

Your attempt with mutate(value = filter(data, year==2002)) doesn't make sense for a few reasons.

When you explicitly pass in data again, it's not part of the chain that got grouped earlier, so it doesn't know about the grouping.
All dplyr verbs take a data frame as first argument and return a data frame, including filter. When you do value = filter(...) you're trying to assign a full data frame to the single column value.

Mutate repeat the value by group dplyr

here it is little R freak

> data %>% group_by(year) %>% 
+   mutate(value_tmp = if_else(category == "A", value, NA_real_),
+          value2 = mean(value_tmp, na.rm = TRUE))
# A tibble: 18 x 5
# Groups:   year [6]
   category  year      value   value_tmp     value2
   <fct>    <int>      <dbl>       <dbl>      <dbl>
 1 A         2000 0.01818495  0.01818495 0.01818495
 2 B         2000 0.5649932  NA          0.01818495
 3 C         2000 0.5483291  NA          0.01818495
 4 A         2001 0.9175864   0.9175864  0.9175864 
 5 B         2001 0.2415837  NA          0.9175864 
 6 C         2001 0.2250608  NA          0.9175864 
 7 A         2002 0.6037224   0.6037224  0.6037224 
 8 B         2002 0.8712926  NA          0.6037224 
 9 C         2002 0.6293625  NA          0.6037224 
10 A         2003 0.8126948   0.8126948  0.8126948 
11 B         2003 0.7540445  NA          0.8126948 
12 C         2003 0.02220114 NA          0.8126948 
13 A         2004 0.3961279   0.3961279  0.3961279 
14 B         2004 0.3638186  NA          0.3961279 
15 C         2004 0.8682010  NA          0.3961279 
16 A         2005 0.04196315  0.04196315 0.04196315
17 B         2005 0.4879482  NA          0.04196315
18 C         2005 0.8605212  NA          0.04196315

R: How to replace all rows in a group with a value from first row in a df

There's a first function in dplyr

mtcars %>%
  group_by(carb)%>%
  mutate(carb_1 = first(qsec))

Though if you are ordering by qsec you could also just use min

Assign the value of the first row of a group to the whole group

DT %>%
  group_by(id) %>%
  mutate(id.date = date[1])

Scale relative to a value in each group (via dplyr)

This solution is very similar to @thelatemail, but I think it's sufficiently different enough to merit its own answer because it chooses the index based on a condition:

data %>%
    group_by(category) %>%
    mutate(value = value/value[year == baseYear])

#   category  year      value
#...     ...   ...       ...
#7         A  2002 1.00000000
#8         B  2002 1.00000000
#9         C  2002 1.00000000
#10        A  2003 0.86462789
#11        B  2003 1.07217943
#12        C  2003 0.82209897

(Data output has been truncated. To replicate these results, set.seed(123) when creating data.)

R Studio - group by dataframe and get statistics using dplyr

You're almost there. It's just that your "last(category)" grouping is based only on ID rather than both ID and direction. If you change it to:

res <- df %>%
  group_by(ID) %>% 
  mutate(category = last(category)) %>% 
  ungroup %>% 
  group_by(ID, direction, category) %>% 
  summarise(
    sum_value = sum(value),
    count_value = length(value)
  ) %>% 
  ungroup

It should do the trick.

R: how to propagate a single value to an entire column?

df%>%
  group_by(group)%>%
  mutate(output=value[time==0][1])
  #we take the first non missing match
   group  time value output
  <fctr> <dbl> <dbl>  <dbl>
1      A    -1    10     23
2      A     0    23     23
3      A     2     5     23
4      B     0    22     22
5      B     4    11     22

Edit:
If there is no time==0

  group time value
1     A   -1    10
2     A    0    23
3     A    2     5
4     B    0    22
5     B    4    11
6     C    1    10

We use

df%>%group_by(group)%>%
  mutate(output=ifelse(length(value[time==0][1])>0,value[time==0][1],NA))
   group  time value output
  <fctr> <dbl> <dbl>  <dbl>
1      A    -1    10     23
2      A     0    23     23
3      A     2     5     23
4      B     0    22     22
5      B     4    11     22
6      C     1    10     NA

Edit 2: multiple 0's in value for the same group

  group  time value output
  <fctr> <dbl> <int>  <int>
1      A    -1     1      2
2      A     0     2      2
3      A     2     3      2
4      B     0     4      4
5      B     4     5      4
6      B     0     6      4
7      C     3     7     NA
8      C     5     8     NA

dplyr creating new column based on some condition

We could use first after grouping by 'cust_id'. The single value will be recycled for the entire grouping

library(dplyr)
df <- df %>% 
    group_by(cust_id) %>%
    mutate(geo_num_new = first(geo_num)) %>%
    ungroup

-ouptut

df
# A tibble: 8 x 4
  geo_num cust_id sales geo_num_new
    <dbl> <chr>   <dbl>       <dbl>
1      11 A           2          11
2      12 A           3          11
3      22 B           2          22
4      41 C           1          41
5      42 C           2          41
6      43 C           4          41
7      77 D           6          77
8      71 D           3          77

Or use data.table

library(data.table)
setDT(df)[, geo_num_new := first(geo_num), by = cust_id]

or with base R

df$geo_num_new <- with(df, ave(geo_num, cust_id, FUN = function(x) x[1]))

Or an option with collapse

library(collapse)
tfm(df, geo_num_new = ffirst(geo_num, g = cust_id, TRA = "replace"))
  geo_num cust_id sales geo_num_new
1      11       A     2          11
2      12       A     3          11
3      22       B     2          22
4      41       C     1          41
5      42       C     2          41
6      43       C     4          41
7      77       D     6          77
8      71       D     3          77

Divide (and name) one group of columns by another group in dplyr

Don't store variables in column names. If you reshape your data to make it tidy, the calculation is really simple:

library(tidyverse)

df %>% gather(var, val, -year) %>%    # reshape to long
    separate(var, c('var', 'letter'), fill = 'left') %>%    # extract var from former col names
    mutate(var = coalesce(var, 'value')) %>%    # add name for unnamed var
    spread(var, val) %>%    # reshape back to wide
    mutate(weight = value / count)    # now this is very simple

#>   year letter count value     weight
#> 1 1999      A     1    10 10.0000000
#> 2 1999      B     8     3  0.3750000
#> 3 1999      C     5     1  0.2000000
#> 4 2000      A     2    20 10.0000000
#> 5 2000      B     9     6  0.6666667
#> 6 2000      C     7     2  0.2857143

Dplyr Broadcasting Single Value Per Group in Mutate