Calculate group mean while excluding current observation using dplyr
No need to define a custom function, instead we could simply sum all elements of the group, subtract the current value, and divide by number of elements per group minus 1
.
df %>% group_by(grouping) %>%
mutate(special_mean = (sum(value) - value)/(n()-1))
# grouping value special_mean
# (chr) (int) (dbl)
#1 A 1 8.5
#2 A 6 6.0
#3 A 11 3.5
#4 B 2 9.5
#5 B 7 7.0
Calculate group variance while excluding current observation
Try the following :
library(dplyr)
DF %>%
group_by(School)%>%
mutate(Var_grade = purrr::map_dbl(row_number(), ~var(grade[-.x])))
# School grade Var_grade
# <int> <dbl> <dbl>
#1 1 90 112.
#2 1 80 12.5
#3 1 95 50
#4 2 100 108.
#5 2 65 225
#6 2 70 308.
#7 2 85 358.
In base you can use ave
with sapply
:
DF$Var_grade <- with(DF, ave(grade, School, FUN = function(x)
sapply(seq_along(x), function(i) var(x[-i]))))
data
DF <- data.frame(School = rep(1:2, c(3, 4)),
grade = c(90, 80, 95, 100, 65, 70, 85))
Exclude current observation from computation in dplyr pipe
For a general case to remove current observation and perform calculation, you could use map_dbl
library(dplyr)
library(purrr)
da %>%
group_by(ice_id) %>%
mutate(mean_price = mean(price),
mean_price_without = map_dbl(day, ~mean(price[-.x])))
#Or
#mean_price_without = map_dbl(day, ~mean(price[day != .x])))
#mean_price_without = map_dbl(row_number(), ~mean(price[-.x])))
# ice_id day price mean_price mean_price_without
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1.6 1.77 1.85
#2 1 2 1.9 1.77 1.7
#3 1 3 1.8 1.77 1.75
#4 2 1 2.1 2.15 2.17
#5 2 2 2.05 2.15 2.2
#6 2 3 2.3 2.15 2.08
#7 3 1 0.5 0.417 0.375
#8 3 2 0.4 0.417 0.425
#9 3 3 0.35 0.417 0.45
Get group mean with multiple grouping variables and excluding own group value
library(dplyr)
df %>%
group_by(state, year) %>%
mutate(q = (sum(value) - value) / (n()-1))
#> # A tibble: 12 x 5
#> # Groups: state, year [4]
#> state county year value q
#> <chr> <chr> <int> <int> <dbl>
#> 1 AL a 2011 68 30.5
#> 2 AL a 2012 63 42
#> 3 AL b 2011 53 38
#> 4 AL b 2012 56 45.5
#> 5 AL c 2011 8 60.5
#> 6 AL c 2012 28 59.5
#> 7 CA d 2011 7 40
#> 8 CA d 2012 69 41
#> 9 CA e 2011 39 24
#> 10 CA e 2012 79 36
#> 11 CA f 2011 41 23
#> 12 CA f 2012 3 74
Data:
#data_frame is deprecate!
df <- tibble(
state = rep(c("AL", "CA"), each = 6),
county = rep(letters[1:6], each = 2),
year = rep(c(2011:2012), 6),
value = sample.int(100, 12)
)
Taking group means, excluding the observation itself (and dealing with NA's)
IIUC this should do what you're looking for:
DF[, mean_value := (sum(value, na.rm=TRUE)-value)/(sum(!is.na(value))-!is.na(value)),
by=c("iso", "year")]
A B D value iso year mean_value
1: 0 1 1 NA ECU 2009 NA
2: 1 0 2 1 ECU 2009 2.0
3: 1 0 1 2 ECU 2009 1.0
4: 0 0 3 1 BRA 2011 0.5
5: 1 0 4 0 BRA 2011 1.0
6: 0 0 3 1 BRA 2011 0.5
7: 0 1 7 NA ECU 2008 NA
8: 1 0 1 1 ECU 2008 1.0
9: 1 0 1 1 ECU 2008 1.0
10: 0 0 3 2 BRA 2012 2.0
11: 0 0 3 2 BRA 2012 2.0
12: 1 0 4 NA BRA 2012 NA
Note: you may want to additionally consider edge cases like a group of size 1 with NA value which would lead to division by zero
Compute mean excluding current value
I am not sure if your calculation is correct for group 1 but you can do -
library(data.table)
setDT(df)[, avg2 := (sum(b) - b)/(.N -1), a]
df
# a b avg avg2
#1: 1 7 3 1.0
#2: 1 0 3 4.5
#3: 1 2 3 3.5
#4: 2 1 2 3.0
#5: 2 3 2 1.0
Calculate standard deviation by group excluding current observation in R
An option is to use dplyr
and mapply
. mapply
runs for every row (of group) and sd
calculation excludes the current row.
library(dplyr)
df %>% group_by(country) %>%
mutate(Sp_SD = mapply(function(x)sd(weight[-x]), 1:n()))
# # A tibble: 6 x 3
# # Groups: country [2]
# country weight Sp_SD
# <fctr> <dbl> <dbl>
# 1 A 10.0 0.707
# 2 A 11.0 1.41
# 3 A 12.0 0.707
# 4 B 20.0 3.54
# 5 B 25.0 7.07
# 6 B 30.0 3.54
dplyr mutate: Excluding observations similar to the current one
You can likely do this more succinctly, but this will get you the result.
You essentially create a column which contains the total observations and sum of records for the whole data.frame. Then you group by the X
column and repeat the process, by taking the difference you can calculate your mean.
data
df <- data.frame(X = c("A", "A", "B", "B", "C", "C"),
Y = c(1:6))
solution
library(tidyverse)
df %>%
mutate(total_sum = sum(Y),
total_obs = n()) %>%
group_by(X) %>%
mutate(group_sum = sum(Y),
group_obs = n()) %>%
ungroup() %>%
mutate(other_group_sum = total_sum - group_sum,
other_group_obs = total_obs - group_obs,
other_mean = other_group_sum/other_group_obs) %>%
select(X, Y, other_mean)
result
# A tibble: 6 x 3
X Y other_mean
<fct> <int> <dbl>
1 A 1 4.50
2 A 2 4.50
3 B 3 3.50
4 B 4 3.50
5 C 5 2.50
6 C 6 2.50
Related Topics
Drawing a Barchart to Compare Two Sets of Data Using Ggplot2 Package
Using Parallel's Parlapply: Unable to Access Variables Within Parallel Code
How to Disable "Save Workspace Image" Prompt in R
Grid of Multiple Ggplot2 Plots Which Have Been Made in a for Loop
Can Sweave Produce Many PDFs Automatically
Ggplot2: Adjust the Symbol Size in Legends
Replace All Values in a Matrix <0.1 with 0
How to Test If List Element Exists
Plot a Function with Ggplot, Equivalent of Curve()
Non-Redundant Version of Expand.Grid
Remove Empty Documents from Documenttermmatrix in R Topicmodels
How to Deal with "Data of Class Uneval" Error from Ggplot2
Finding Row Index Containing Maximum Value Using R
How to Convert Data Frame to Spatial Coordinates
How to Maintain Size of Ggplot with Long Labels
Randomly Insert Nas into Dataframe Proportionaly
Simple Way to Subset Spatialpolygonsdataframe (I.E. Delete Polygons) by Attribute in R