Dplyr Broadcasting Single Value Per Group in Mutate

dplyr broadcasting single value per group in mutate

Do this:

data %>% group_by(category) %>%
mutate(value2 = value[year == 2000])

You could also do it this way:

data %>% group_by(category) %>%
arrange(year) %>%
mutate(value2 = value[1])

or

data %>% group_by(category) %>%
arrange(year) %>%
mutate(value2 = first(value))

or

data %>% group_by(category) %>%
mutate(value2 = nth(value, n = 1, order_by = "year"))

or probably several other ways.

Your attempt with mutate(value = filter(data, year==2002)) doesn't make sense for a few reasons.

  1. When you explicitly pass in data again, it's not part of the chain that got grouped earlier, so it doesn't know about the grouping.

  2. All dplyr verbs take a data frame as first argument and return a data frame, including filter. When you do value = filter(...) you're trying to assign a full data frame to the single column value.

Mutate repeat the value by group dplyr

here it is little R freak

> data %>% group_by(year) %>% 
+ mutate(value_tmp = if_else(category == "A", value, NA_real_),
+ value2 = mean(value_tmp, na.rm = TRUE))
# A tibble: 18 x 5
# Groups: year [6]
category year value value_tmp value2
<fct> <int> <dbl> <dbl> <dbl>
1 A 2000 0.01818495 0.01818495 0.01818495
2 B 2000 0.5649932 NA 0.01818495
3 C 2000 0.5483291 NA 0.01818495
4 A 2001 0.9175864 0.9175864 0.9175864
5 B 2001 0.2415837 NA 0.9175864
6 C 2001 0.2250608 NA 0.9175864
7 A 2002 0.6037224 0.6037224 0.6037224
8 B 2002 0.8712926 NA 0.6037224
9 C 2002 0.6293625 NA 0.6037224
10 A 2003 0.8126948 0.8126948 0.8126948
11 B 2003 0.7540445 NA 0.8126948
12 C 2003 0.02220114 NA 0.8126948
13 A 2004 0.3961279 0.3961279 0.3961279
14 B 2004 0.3638186 NA 0.3961279
15 C 2004 0.8682010 NA 0.3961279
16 A 2005 0.04196315 0.04196315 0.04196315
17 B 2005 0.4879482 NA 0.04196315
18 C 2005 0.8605212 NA 0.04196315

R: How to replace all rows in a group with a value from first row in a df

There's a first function in dplyr

mtcars %>%
group_by(carb)%>%
mutate(carb_1 = first(qsec))

Though if you are ordering by qsec you could also just use min

Assign the value of the first row of a group to the whole group


DT %>%
group_by(id) %>%
mutate(id.date = date[1])

Scale relative to a value in each group (via dplyr)

This solution is very similar to @thelatemail, but I think it's sufficiently different enough to merit its own answer because it chooses the index based on a condition:

data %>%
group_by(category) %>%
mutate(value = value/value[year == baseYear])

# category year value
#... ... ... ...
#7 A 2002 1.00000000
#8 B 2002 1.00000000
#9 C 2002 1.00000000
#10 A 2003 0.86462789
#11 B 2003 1.07217943
#12 C 2003 0.82209897

(Data output has been truncated. To replicate these results, set.seed(123) when creating data.)

R Studio - group by dataframe and get statistics using dplyr

You're almost there. It's just that your "last(category)" grouping is based only on ID rather than both ID and direction. If you change it to:

res <- df %>%
group_by(ID) %>%
mutate(category = last(category)) %>%
ungroup %>%
group_by(ID, direction, category) %>%
summarise(
sum_value = sum(value),
count_value = length(value)
) %>%
ungroup

It should do the trick.

R: how to propagate a single value to an entire column?


df%>%
group_by(group)%>%
mutate(output=value[time==0][1])
#we take the first non missing match
group time value output
<fctr> <dbl> <dbl> <dbl>
1 A -1 10 23
2 A 0 23 23
3 A 2 5 23
4 B 0 22 22
5 B 4 11 22

Edit:
If there is no time==0

  group time value
1 A -1 10
2 A 0 23
3 A 2 5
4 B 0 22
5 B 4 11
6 C 1 10

We use

df%>%group_by(group)%>%
mutate(output=ifelse(length(value[time==0][1])>0,value[time==0][1],NA))
group time value output
<fctr> <dbl> <dbl> <dbl>
1 A -1 10 23
2 A 0 23 23
3 A 2 5 23
4 B 0 22 22
5 B 4 11 22
6 C 1 10 NA

Edit 2: multiple 0's in value for the same group

  group  time value output
<fctr> <dbl> <int> <int>
1 A -1 1 2
2 A 0 2 2
3 A 2 3 2
4 B 0 4 4
5 B 4 5 4
6 B 0 6 4
7 C 3 7 NA
8 C 5 8 NA

dplyr creating new column based on some condition

We could use first after grouping by 'cust_id'. The single value will be recycled for the entire grouping

library(dplyr)
df <- df %>%
group_by(cust_id) %>%
mutate(geo_num_new = first(geo_num)) %>%
ungroup

-ouptut

df
# A tibble: 8 x 4
geo_num cust_id sales geo_num_new
<dbl> <chr> <dbl> <dbl>
1 11 A 2 11
2 12 A 3 11
3 22 B 2 22
4 41 C 1 41
5 42 C 2 41
6 43 C 4 41
7 77 D 6 77
8 71 D 3 77

Or use data.table

library(data.table)
setDT(df)[, geo_num_new := first(geo_num), by = cust_id]

or with base R

df$geo_num_new <- with(df, ave(geo_num, cust_id, FUN = function(x) x[1]))

Or an option with collapse

library(collapse)
tfm(df, geo_num_new = ffirst(geo_num, g = cust_id, TRA = "replace"))
geo_num cust_id sales geo_num_new
1 11 A 2 11
2 12 A 3 11
3 22 B 2 22
4 41 C 1 41
5 42 C 2 41
6 43 C 4 41
7 77 D 6 77
8 71 D 3 77

Divide (and name) one group of columns by another group in dplyr

Don't store variables in column names. If you reshape your data to make it tidy, the calculation is really simple:

library(tidyverse)

df %>% gather(var, val, -year) %>% # reshape to long
separate(var, c('var', 'letter'), fill = 'left') %>% # extract var from former col names
mutate(var = coalesce(var, 'value')) %>% # add name for unnamed var
spread(var, val) %>% # reshape back to wide
mutate(weight = value / count) # now this is very simple

#> year letter count value weight
#> 1 1999 A 1 10 10.0000000
#> 2 1999 B 8 3 0.3750000
#> 3 1999 C 5 1 0.2000000
#> 4 2000 A 2 20 10.0000000
#> 5 2000 B 9 6 0.6666667
#> 6 2000 C 7 2 0.2857143


Related Topics



Leave a reply



Submit