dplyr broadcasting single value per group in mutate
Do this:
data %>% group_by(category) %>%
mutate(value2 = value[year == 2000])
You could also do it this way:
data %>% group_by(category) %>%
arrange(year) %>%
mutate(value2 = value[1])
or
data %>% group_by(category) %>%
arrange(year) %>%
mutate(value2 = first(value))
or
data %>% group_by(category) %>%
mutate(value2 = nth(value, n = 1, order_by = "year"))
or probably several other ways.
Your attempt with mutate(value = filter(data, year==2002))
doesn't make sense for a few reasons.
When you explicitly pass in
data
again, it's not part of the chain that got grouped earlier, so it doesn't know about the grouping.All
dplyr
verbs take a data frame as first argument and return a data frame, includingfilter
. When you dovalue = filter(...)
you're trying to assign a full data frame to the single columnvalue
.
Mutate repeat the value by group dplyr
here it is little R freak
> data %>% group_by(year) %>%
+ mutate(value_tmp = if_else(category == "A", value, NA_real_),
+ value2 = mean(value_tmp, na.rm = TRUE))
# A tibble: 18 x 5
# Groups: year [6]
category year value value_tmp value2
<fct> <int> <dbl> <dbl> <dbl>
1 A 2000 0.01818495 0.01818495 0.01818495
2 B 2000 0.5649932 NA 0.01818495
3 C 2000 0.5483291 NA 0.01818495
4 A 2001 0.9175864 0.9175864 0.9175864
5 B 2001 0.2415837 NA 0.9175864
6 C 2001 0.2250608 NA 0.9175864
7 A 2002 0.6037224 0.6037224 0.6037224
8 B 2002 0.8712926 NA 0.6037224
9 C 2002 0.6293625 NA 0.6037224
10 A 2003 0.8126948 0.8126948 0.8126948
11 B 2003 0.7540445 NA 0.8126948
12 C 2003 0.02220114 NA 0.8126948
13 A 2004 0.3961279 0.3961279 0.3961279
14 B 2004 0.3638186 NA 0.3961279
15 C 2004 0.8682010 NA 0.3961279
16 A 2005 0.04196315 0.04196315 0.04196315
17 B 2005 0.4879482 NA 0.04196315
18 C 2005 0.8605212 NA 0.04196315
R: How to replace all rows in a group with a value from first row in a df
There's a first
function in dplyr
mtcars %>%
group_by(carb)%>%
mutate(carb_1 = first(qsec))
Though if you are ordering by qsec
you could also just use min
Assign the value of the first row of a group to the whole group
DT %>%
group_by(id) %>%
mutate(id.date = date[1])
Scale relative to a value in each group (via dplyr)
This solution is very similar to @thelatemail, but I think it's sufficiently different enough to merit its own answer because it chooses the index based on a condition:
data %>%
group_by(category) %>%
mutate(value = value/value[year == baseYear])
# category year value
#... ... ... ...
#7 A 2002 1.00000000
#8 B 2002 1.00000000
#9 C 2002 1.00000000
#10 A 2003 0.86462789
#11 B 2003 1.07217943
#12 C 2003 0.82209897
(Data output has been truncated. To replicate these results, set.seed(123)
when creating data
.)
R Studio - group by dataframe and get statistics using dplyr
You're almost there. It's just that your "last(category)" grouping is based only on ID rather than both ID and direction. If you change it to:
res <- df %>%
group_by(ID) %>%
mutate(category = last(category)) %>%
ungroup %>%
group_by(ID, direction, category) %>%
summarise(
sum_value = sum(value),
count_value = length(value)
) %>%
ungroup
It should do the trick.
R: how to propagate a single value to an entire column?
df%>%
group_by(group)%>%
mutate(output=value[time==0][1])
#we take the first non missing match
group time value output
<fctr> <dbl> <dbl> <dbl>
1 A -1 10 23
2 A 0 23 23
3 A 2 5 23
4 B 0 22 22
5 B 4 11 22
Edit:
If there is no time==0
group time value
1 A -1 10
2 A 0 23
3 A 2 5
4 B 0 22
5 B 4 11
6 C 1 10
We use
df%>%group_by(group)%>%
mutate(output=ifelse(length(value[time==0][1])>0,value[time==0][1],NA))
group time value output
<fctr> <dbl> <dbl> <dbl>
1 A -1 10 23
2 A 0 23 23
3 A 2 5 23
4 B 0 22 22
5 B 4 11 22
6 C 1 10 NA
Edit 2: multiple 0's in value for the same group
group time value output
<fctr> <dbl> <int> <int>
1 A -1 1 2
2 A 0 2 2
3 A 2 3 2
4 B 0 4 4
5 B 4 5 4
6 B 0 6 4
7 C 3 7 NA
8 C 5 8 NA
dplyr creating new column based on some condition
We could use first
after grouping by 'cust_id'. The single value will be recycled for the entire grouping
library(dplyr)
df <- df %>%
group_by(cust_id) %>%
mutate(geo_num_new = first(geo_num)) %>%
ungroup
-ouptut
df
# A tibble: 8 x 4
geo_num cust_id sales geo_num_new
<dbl> <chr> <dbl> <dbl>
1 11 A 2 11
2 12 A 3 11
3 22 B 2 22
4 41 C 1 41
5 42 C 2 41
6 43 C 4 41
7 77 D 6 77
8 71 D 3 77
Or use data.table
library(data.table)
setDT(df)[, geo_num_new := first(geo_num), by = cust_id]
or with base R
df$geo_num_new <- with(df, ave(geo_num, cust_id, FUN = function(x) x[1]))
Or an option with collapse
library(collapse)
tfm(df, geo_num_new = ffirst(geo_num, g = cust_id, TRA = "replace"))
geo_num cust_id sales geo_num_new
1 11 A 2 11
2 12 A 3 11
3 22 B 2 22
4 41 C 1 41
5 42 C 2 41
6 43 C 4 41
7 77 D 6 77
8 71 D 3 77
Divide (and name) one group of columns by another group in dplyr
Don't store variables in column names. If you reshape your data to make it tidy, the calculation is really simple:
library(tidyverse)
df %>% gather(var, val, -year) %>% # reshape to long
separate(var, c('var', 'letter'), fill = 'left') %>% # extract var from former col names
mutate(var = coalesce(var, 'value')) %>% # add name for unnamed var
spread(var, val) %>% # reshape back to wide
mutate(weight = value / count) # now this is very simple
#> year letter count value weight
#> 1 1999 A 1 10 10.0000000
#> 2 1999 B 8 3 0.3750000
#> 3 1999 C 5 1 0.2000000
#> 4 2000 A 2 20 10.0000000
#> 5 2000 B 9 6 0.6666667
#> 6 2000 C 7 2 0.2857143
Related Topics
Aggregate and Reshape from Long to Wide
Advantages of Reactive VS. Observe VS. Observeevent
Plotting Pca Biplot with Ggplot2
How to Use a List as a Hash in R? If So, Why Is It So Slow
Cut() Error - 'Breaks' Are Not Unique
Converting a List of Data Frames into Individual Data Frames in R
Standard Deviation in R Seems to Be Returning the Wrong Answer - am I Doing Something Wrong
Perform Multiple Paired T-Tests Based on Groups/Categories
How to Create a Grouped Boxplot in R
How to Complete Missing Factor Levels in Data Frame
List for Multiple Plots from Loop (Ggplot2) - List Elements Being Overwritten
Adding New Column with Diff() Function When There Is One Less Row in R
R: Ggplot2 Barplot and Error Bar
How to Delete a Column by Name in Data.Table
Possible to Show Console Messages (Written with 'Message') in a Shiny Ui
Test If an Argument of a Function Is Set or Not in R