Subtracting Values Group-Wise by the Average of Each Group in R

Subtracting values group-wise by the average of each group in R

Using the library dplyr, you can do:

library(dplyr)
x %>%
  group_by(gene) %>%
  mutate_all(funs(.-mean(.)))

# A tibble: 8 x 2
# Groups:   gene [3]
  gene   value
  <fct>  <dbl>
1 A      1.03 
2 A     -0.267
3 A     -0.767
4 B      1.45 
5 B     -1.45 
6 C      0    
7 C      0.700
8 C     -0.700

subtract value from previous row by group

With dplyr:

library(dplyr)

data %>%
    group_by(id) %>%
    arrange(date) %>%
    mutate(diff = value - lag(value, default = first(value)))

For clarity you can arrange by date and grouping column (as per comment by lawyer)

data %>%
    group_by(id) %>%
    arrange(date, .by_group = TRUE) %>%
    mutate(diff = value - lag(value, default = first(value)))

or lag with order_by:

data %>%
    group_by(id) %>%
    mutate(diff = value - lag(value, default = first(value), order_by = date))

With data.table:

library(data.table)

dt <- as.data.table(data)
setkey(dt, id, date)
dt[, diff := value - shift(value, fill = first(value)), by = id]

Subtraction within Groups using R

You can also do that with dplyr:

require(dplyr)

df %.% 
   group_by(Sample) %.% 
   mutate(dValue = value[condition == "A"] - value)

#  Sample condition value dValue
#1   var1         A    12      0
#2   var1         B    14     -2
#3   var1         C    15     -3
#4   var2         A    20      0
#5   var2         B    19      1
#6   var2         C    19      1
#7   var3         A    50      0
#8   var3         B    51     -1
#9   var3         C    48      2

Want to get the dataframe of values that are deviations from the mean based on a factor column

A dplyr solution

library(dplyr)
x %>% group_by(factor) %>% mutate(across(c(value1, value2), ~. - mean(.)))

Output

# A tibble: 6 x 3
# Groups:   factor [3]
  factor value1 value2
  <fct>   <dbl>  <dbl>
1 a          -1   -1  
2 a           1    1  
3 b          -1   -0.5
4 b           1    0.5
5 c           1    3  
6 c          -1   -3

Calculate group mean while excluding current observation using dplyr

No need to define a custom function, instead we could simply sum all elements of the group, subtract the current value, and divide by number of elements per group minus 1.

df %>% group_by(grouping) %>%
        mutate(special_mean = (sum(value) - value)/(n()-1))
#   grouping value special_mean
#      (chr) (int)        (dbl)
#1         A     1          8.5
#2         A     6          6.0
#3         A    11          3.5
#4         B     2          9.5
#5         B     7          7.0

DataFrame subtract group-wise means

If you use the transform method, e.g.,

means = df.groupby(group, axis=1).transform('mean')

then transform will a DataFrame of the same shape as df. This makes it easier to subtract means from df.

You can also pass a sequence, such as group=[1,1,1,2,2,3,3] to df.groupby instead of passing a column name. df.groupby(group, axis=1) will group the columns based on the sequence values. So, for example, to group according to the non-numeric part of each column name, you could use:

import numpy as np
import datetime as DT
np.random.seed(2016)
base = DT.date.today()
date_list = [base - DT.timedelta(days=x) for x in range(0, 10)]
df = pd.DataFrame(data=np.random.randint(1, 100, (10, 8)), 
                  index=date_list, 
                  columns=['a1', 'a2', 'b1', 'a3', 'b2', 'c1' , 'c2', 'b3'])

group = df.columns.str.extract(r'(\D+)', expand=False)
means = df.groupby(group, axis=1).transform('mean')
result = df - means
print(result)

which yields

            a1  a2  b1  a3  b2  c1  c2  b3
2016-05-18  29  29  53  29  53  23  23  53
2016-05-17  55  55  32  55  32  92  92  32
2016-05-16  59  59  53  59  53  50  50  53
2016-05-15  46  46  30  46  30  55  55  30
2016-05-14  56  56  28  56  28  28  28  28
2016-05-13  34  34  36  34  36  70  70  36
2016-05-12  39  39  64  39  64  48  48  64
2016-05-11  45  45  59  45  59  57  57  59
2016-05-10  55  55  30  55  30  37  37  30
2016-05-09  61  61  59  61  59  59  59  59

Calculate difference between values in consecutive rows by group

The package data.table can do this fairly quickly, using the shift function.

require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame

df[ , diff := value - shift(value), by = group]    
#   group value diff
#1:     1    10   NA
#2:     1    20   10
#3:     1    25    5
#4:     2     5   NA
#5:     2    10    5
#6:     2    15    5
setDF(df) #if you want to convert back to old data.frame syntax

Or using the lag function in dplyr

df %>%
    group_by(group) %>%
    mutate(Diff = value - lag(value))
#   group value  Diff
#   <int> <int> <int>
# 1     1    10    NA
# 2     1    20    10
# 3     1    25     5
# 4     2     5    NA
# 5     2    10     5
# 6     2    15     5

For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

Calculate average of different value in R table

You can use the formula interface.

d <- read.table(text="Dimension,Config,Result
                3,1,6.43547800901942e-12
                3,1,3.10671396584125e-15
                3,1,5.86997050075184e-07
                3,2,1.57865350726808
                3,2,0.125293574811717
                3,2,0.096173751923243
                4,1,3.33845065295529e-08
                4,1,4.57511389653726e-07
                4,1,2.58918409465438e-07
                4,2,3.23375251723051
                4,2,2.13142950121767
                4,2,0.510008166587752", header=T, sep=',')

aggregate(Result ~ Dimension+Config, data=d, mean) 
  Dimension Config       Result
1         3      1 1.956678e-07
2         4      1 2.499381e-07
3         3      2 6.000403e-01
4         4      2 1.958397e+00

Using apply function to average dataframe groups

I think the fastest replacement to aggregate() would be to use data.table

library(data.table)
( dt <- setDT(df)[, lapply(.SD, mean), by = ID] )
#    ID         a        b        c        d        e
# 1: no 25.000000 26.00000 24.66667 39.00000 39.66667
# 2: bo 40.666667 25.33333 31.33333 37.00000 19.33333
# 3: fo  5.333333 28.00000 53.33333 11.66667 29.33333
# 4: to 30.666667 47.33333 27.00000 41.33333 28.00000

For the row subtraction, we could write a function and use it with Map().

f <- function(x, y) {
    dt[ID == x, -1, with = FALSE] - dt[ID == y, -1, with = FALSE]
}
rbindlist(Map(f, c("bo", "fo", "to", "to"), c("no", "no", "bo", "fo")))
#            a          b          c          d          e
# 1:  15.66667 -0.6666667   6.666667  -2.000000 -20.333333
# 2: -19.66667  2.0000000  28.666667 -27.333333 -10.333333
# 3: -10.00000 22.0000000  -4.333333   4.333333   8.666667
# 4:  25.33333 19.3333333 -26.333333  29.666667  -1.333333

There is probably a better way to write the function f() and that last call in data.table and I will try to improve it if possible. Note that this output will not match yours due to your use of sample() without setting a seed.

Another possibility would be to do the following. This will give you the row names you want.

A <- c("bo", "fo", "to", "to")
B <- c("no", "no", "bo", "fo")
df2 <- as.data.frame(rbindlist(Map(f, A, B)))
rownames(df2) <- paste(A, B, sep = "-")
df2
#               a          b          c          d          e
# bo-no  15.66667 -0.6666667   6.666667  -2.000000 -20.333333
# fo-no -19.66667  2.0000000  28.666667 -27.333333 -10.333333
# to-bo -10.00000 22.0000000  -4.333333   4.333333   8.666667
# to-fo  25.33333 19.3333333 -26.333333  29.666667  -1.333333

Subtracting Values Group-Wise by the Average of Each Group in R