Subtracting Values Group-Wise by the Average of Each Group in R

Subtracting values group-wise by the average of each group in R

Using the library dplyr, you can do:

library(dplyr)
x %>%
group_by(gene) %>%
mutate_all(funs(.-mean(.)))

# A tibble: 8 x 2
# Groups: gene [3]
gene value
<fct> <dbl>
1 A 1.03
2 A -0.267
3 A -0.767
4 B 1.45
5 B -1.45
6 C 0
7 C 0.700
8 C -0.700

subtract value from previous row by group

With dplyr:

library(dplyr)

data %>%
group_by(id) %>%
arrange(date) %>%
mutate(diff = value - lag(value, default = first(value)))

For clarity you can arrange by date and grouping column (as per comment by lawyer)

data %>%
group_by(id) %>%
arrange(date, .by_group = TRUE) %>%
mutate(diff = value - lag(value, default = first(value)))

or lag with order_by:

data %>%
group_by(id) %>%
mutate(diff = value - lag(value, default = first(value), order_by = date))

With data.table:

library(data.table)

dt <- as.data.table(data)
setkey(dt, id, date)
dt[, diff := value - shift(value, fill = first(value)), by = id]

Subtraction within Groups using R

You can also do that with dplyr:

require(dplyr)

df %.%
group_by(Sample) %.%
mutate(dValue = value[condition == "A"] - value)

# Sample condition value dValue
#1 var1 A 12 0
#2 var1 B 14 -2
#3 var1 C 15 -3
#4 var2 A 20 0
#5 var2 B 19 1
#6 var2 C 19 1
#7 var3 A 50 0
#8 var3 B 51 -1
#9 var3 C 48 2

Want to get the dataframe of values that are deviations from the mean based on a factor column

A dplyr solution

library(dplyr)
x %>% group_by(factor) %>% mutate(across(c(value1, value2), ~. - mean(.)))

Output

# A tibble: 6 x 3
# Groups: factor [3]
factor value1 value2
<fct> <dbl> <dbl>
1 a -1 -1
2 a 1 1
3 b -1 -0.5
4 b 1 0.5
5 c 1 3
6 c -1 -3

Calculate group mean while excluding current observation using dplyr

No need to define a custom function, instead we could simply sum all elements of the group, subtract the current value, and divide by number of elements per group minus 1.

df %>% group_by(grouping) %>%
mutate(special_mean = (sum(value) - value)/(n()-1))
# grouping value special_mean
# (chr) (int) (dbl)
#1 A 1 8.5
#2 A 6 6.0
#3 A 11 3.5
#4 B 2 9.5
#5 B 7 7.0

DataFrame subtract group-wise means

If you use the transform method, e.g.,

means = df.groupby(group, axis=1).transform('mean')

then transform will a DataFrame of the same shape as df. This makes it easier to subtract means from df.

You can also pass a sequence, such as group=[1,1,1,2,2,3,3] to df.groupby instead of passing a column name. df.groupby(group, axis=1) will group the columns based on the sequence values. So, for example, to group according to the non-numeric part of each column name, you could use:

import numpy as np
import datetime as DT
np.random.seed(2016)
base = DT.date.today()
date_list = [base - DT.timedelta(days=x) for x in range(0, 10)]
df = pd.DataFrame(data=np.random.randint(1, 100, (10, 8)),
index=date_list,
columns=['a1', 'a2', 'b1', 'a3', 'b2', 'c1' , 'c2', 'b3'])

group = df.columns.str.extract(r'(\D+)', expand=False)
means = df.groupby(group, axis=1).transform('mean')
result = df - means
print(result)

which yields

            a1  a2  b1  a3  b2  c1  c2  b3
2016-05-18 29 29 53 29 53 23 23 53
2016-05-17 55 55 32 55 32 92 92 32
2016-05-16 59 59 53 59 53 50 50 53
2016-05-15 46 46 30 46 30 55 55 30
2016-05-14 56 56 28 56 28 28 28 28
2016-05-13 34 34 36 34 36 70 70 36
2016-05-12 39 39 64 39 64 48 48 64
2016-05-11 45 45 59 45 59 57 57 59
2016-05-10 55 55 30 55 30 37 37 30
2016-05-09 61 61 59 61 59 59 59 59

Calculate difference between values in consecutive rows by group

The package data.table can do this fairly quickly, using the shift function.

require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame

df[ , diff := value - shift(value), by = group]
# group value diff
#1: 1 10 NA
#2: 1 20 10
#3: 1 25 5
#4: 2 5 NA
#5: 2 10 5
#6: 2 15 5
setDF(df) #if you want to convert back to old data.frame syntax

Or using the lag function in dplyr

df %>%
group_by(group) %>%
mutate(Diff = value - lag(value))
# group value Diff
# <int> <int> <int>
# 1 1 10 NA
# 2 1 20 10
# 3 1 25 5
# 4 2 5 NA
# 5 2 10 5
# 6 2 15 5

For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

Calculate average of different value in R table

You can use the formula interface.

d <- read.table(text="Dimension,Config,Result
3,1,6.43547800901942e-12
3,1,3.10671396584125e-15
3,1,5.86997050075184e-07
3,2,1.57865350726808
3,2,0.125293574811717
3,2,0.096173751923243
4,1,3.33845065295529e-08
4,1,4.57511389653726e-07
4,1,2.58918409465438e-07
4,2,3.23375251723051
4,2,2.13142950121767
4,2,0.510008166587752", header=T, sep=',')

aggregate(Result ~ Dimension+Config, data=d, mean)
Dimension Config Result
1 3 1 1.956678e-07
2 4 1 2.499381e-07
3 3 2 6.000403e-01
4 4 2 1.958397e+00

Using apply function to average dataframe groups

I think the fastest replacement to aggregate() would be to use data.table

library(data.table)
( dt <- setDT(df)[, lapply(.SD, mean), by = ID] )
# ID a b c d e
# 1: no 25.000000 26.00000 24.66667 39.00000 39.66667
# 2: bo 40.666667 25.33333 31.33333 37.00000 19.33333
# 3: fo 5.333333 28.00000 53.33333 11.66667 29.33333
# 4: to 30.666667 47.33333 27.00000 41.33333 28.00000

For the row subtraction, we could write a function and use it with Map().

f <- function(x, y) {
dt[ID == x, -1, with = FALSE] - dt[ID == y, -1, with = FALSE]
}
rbindlist(Map(f, c("bo", "fo", "to", "to"), c("no", "no", "bo", "fo")))
# a b c d e
# 1: 15.66667 -0.6666667 6.666667 -2.000000 -20.333333
# 2: -19.66667 2.0000000 28.666667 -27.333333 -10.333333
# 3: -10.00000 22.0000000 -4.333333 4.333333 8.666667
# 4: 25.33333 19.3333333 -26.333333 29.666667 -1.333333

There is probably a better way to write the function f() and that last call in data.table and I will try to improve it if possible. Note that this output will not match yours due to your use of sample() without setting a seed.

Another possibility would be to do the following. This will give you the row names you want.

A <- c("bo", "fo", "to", "to")
B <- c("no", "no", "bo", "fo")
df2 <- as.data.frame(rbindlist(Map(f, A, B)))
rownames(df2) <- paste(A, B, sep = "-")
df2
# a b c d e
# bo-no 15.66667 -0.6666667 6.666667 -2.000000 -20.333333
# fo-no -19.66667 2.0000000 28.666667 -27.333333 -10.333333
# to-bo -10.00000 22.0000000 -4.333333 4.333333 8.666667
# to-fo 25.33333 19.3333333 -26.333333 29.666667 -1.333333


Related Topics



Leave a reply



Submit