Subtracting values group-wise by the average of each group in R
Using the library dplyr
, you can do:
library(dplyr)
x %>%
group_by(gene) %>%
mutate_all(funs(.-mean(.)))
# A tibble: 8 x 2
# Groups: gene [3]
gene value
<fct> <dbl>
1 A 1.03
2 A -0.267
3 A -0.767
4 B 1.45
5 B -1.45
6 C 0
7 C 0.700
8 C -0.700
subtract value from previous row by group
With dplyr
:
library(dplyr)
data %>%
group_by(id) %>%
arrange(date) %>%
mutate(diff = value - lag(value, default = first(value)))
For clarity you can arrange
by date
and grouping column (as per comment by lawyer)
data %>%
group_by(id) %>%
arrange(date, .by_group = TRUE) %>%
mutate(diff = value - lag(value, default = first(value)))
or lag
with order_by
:
data %>%
group_by(id) %>%
mutate(diff = value - lag(value, default = first(value), order_by = date))
With data.table
:
library(data.table)
dt <- as.data.table(data)
setkey(dt, id, date)
dt[, diff := value - shift(value, fill = first(value)), by = id]
Subtraction within Groups using R
You can also do that with dplyr
:
require(dplyr)
df %.%
group_by(Sample) %.%
mutate(dValue = value[condition == "A"] - value)
# Sample condition value dValue
#1 var1 A 12 0
#2 var1 B 14 -2
#3 var1 C 15 -3
#4 var2 A 20 0
#5 var2 B 19 1
#6 var2 C 19 1
#7 var3 A 50 0
#8 var3 B 51 -1
#9 var3 C 48 2
Want to get the dataframe of values that are deviations from the mean based on a factor column
A dplyr
solution
library(dplyr)
x %>% group_by(factor) %>% mutate(across(c(value1, value2), ~. - mean(.)))
Output
# A tibble: 6 x 3
# Groups: factor [3]
factor value1 value2
<fct> <dbl> <dbl>
1 a -1 -1
2 a 1 1
3 b -1 -0.5
4 b 1 0.5
5 c 1 3
6 c -1 -3
Calculate group mean while excluding current observation using dplyr
No need to define a custom function, instead we could simply sum all elements of the group, subtract the current value, and divide by number of elements per group minus 1
.
df %>% group_by(grouping) %>%
mutate(special_mean = (sum(value) - value)/(n()-1))
# grouping value special_mean
# (chr) (int) (dbl)
#1 A 1 8.5
#2 A 6 6.0
#3 A 11 3.5
#4 B 2 9.5
#5 B 7 7.0
DataFrame subtract group-wise means
If you use the transform
method, e.g.,
means = df.groupby(group, axis=1).transform('mean')
then transform
will a DataFrame of the same shape as df
. This makes it easier to subtract means
from df
.
You can also pass a sequence, such as group=[1,1,1,2,2,3,3]
to df.groupby
instead of passing a column name. df.groupby(group, axis=1)
will group the columns based on the sequence values. So, for example, to group according to the non-numeric part of each column name, you could use:
import numpy as np
import datetime as DT
np.random.seed(2016)
base = DT.date.today()
date_list = [base - DT.timedelta(days=x) for x in range(0, 10)]
df = pd.DataFrame(data=np.random.randint(1, 100, (10, 8)),
index=date_list,
columns=['a1', 'a2', 'b1', 'a3', 'b2', 'c1' , 'c2', 'b3'])
group = df.columns.str.extract(r'(\D+)', expand=False)
means = df.groupby(group, axis=1).transform('mean')
result = df - means
print(result)
which yields
a1 a2 b1 a3 b2 c1 c2 b3
2016-05-18 29 29 53 29 53 23 23 53
2016-05-17 55 55 32 55 32 92 92 32
2016-05-16 59 59 53 59 53 50 50 53
2016-05-15 46 46 30 46 30 55 55 30
2016-05-14 56 56 28 56 28 28 28 28
2016-05-13 34 34 36 34 36 70 70 36
2016-05-12 39 39 64 39 64 48 48 64
2016-05-11 45 45 59 45 59 57 57 59
2016-05-10 55 55 30 55 30 37 37 30
2016-05-09 61 61 59 61 59 59 59 59
Calculate difference between values in consecutive rows by group
The package data.table
can do this fairly quickly, using the shift
function.
require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame
df[ , diff := value - shift(value), by = group]
# group value diff
#1: 1 10 NA
#2: 1 20 10
#3: 1 25 5
#4: 2 5 NA
#5: 2 10 5
#6: 2 15 5
setDF(df) #if you want to convert back to old data.frame syntax
Or using the lag
function in dplyr
df %>%
group_by(group) %>%
mutate(Diff = value - lag(value))
# group value Diff
# <int> <int> <int>
# 1 1 10 NA
# 2 1 20 10
# 3 1 25 5
# 4 2 5 NA
# 5 2 10 5
# 6 2 15 5
For alternatives pre-data.table::shift
and pre-dplyr::lag
, see edits.
Calculate average of different value in R table
You can use the formula interface.
d <- read.table(text="Dimension,Config,Result
3,1,6.43547800901942e-12
3,1,3.10671396584125e-15
3,1,5.86997050075184e-07
3,2,1.57865350726808
3,2,0.125293574811717
3,2,0.096173751923243
4,1,3.33845065295529e-08
4,1,4.57511389653726e-07
4,1,2.58918409465438e-07
4,2,3.23375251723051
4,2,2.13142950121767
4,2,0.510008166587752", header=T, sep=',')
aggregate(Result ~ Dimension+Config, data=d, mean)
Dimension Config Result
1 3 1 1.956678e-07
2 4 1 2.499381e-07
3 3 2 6.000403e-01
4 4 2 1.958397e+00
Using apply function to average dataframe groups
I think the fastest replacement to aggregate()
would be to use data.table
library(data.table)
( dt <- setDT(df)[, lapply(.SD, mean), by = ID] )
# ID a b c d e
# 1: no 25.000000 26.00000 24.66667 39.00000 39.66667
# 2: bo 40.666667 25.33333 31.33333 37.00000 19.33333
# 3: fo 5.333333 28.00000 53.33333 11.66667 29.33333
# 4: to 30.666667 47.33333 27.00000 41.33333 28.00000
For the row subtraction, we could write a function and use it with Map()
.
f <- function(x, y) {
dt[ID == x, -1, with = FALSE] - dt[ID == y, -1, with = FALSE]
}
rbindlist(Map(f, c("bo", "fo", "to", "to"), c("no", "no", "bo", "fo")))
# a b c d e
# 1: 15.66667 -0.6666667 6.666667 -2.000000 -20.333333
# 2: -19.66667 2.0000000 28.666667 -27.333333 -10.333333
# 3: -10.00000 22.0000000 -4.333333 4.333333 8.666667
# 4: 25.33333 19.3333333 -26.333333 29.666667 -1.333333
There is probably a better way to write the function f()
and that last call in data.table and I will try to improve it if possible. Note that this output will not match yours due to your use of sample()
without setting a seed.
Another possibility would be to do the following. This will give you the row names you want.
A <- c("bo", "fo", "to", "to")
B <- c("no", "no", "bo", "fo")
df2 <- as.data.frame(rbindlist(Map(f, A, B)))
rownames(df2) <- paste(A, B, sep = "-")
df2
# a b c d e
# bo-no 15.66667 -0.6666667 6.666667 -2.000000 -20.333333
# fo-no -19.66667 2.0000000 28.666667 -27.333333 -10.333333
# to-bo -10.00000 22.0000000 -4.333333 4.333333 8.666667
# to-fo 25.33333 19.3333333 -26.333333 29.666667 -1.333333
Related Topics
R: How to Draw a Line with Multiple Arrows in It
How to Install R Package from Private Repo Using Devtools Install_Github
Adjusting Width of Tables Made with Kable() in Rmarkdown Documents
Encrypting R Script Under Ms-Windows
How to Self Join a Data.Table on a Condition
How to Display the Median Value in a Boxplot in Ggplot
Data.Table in R - Multiple Filters Using Multiple Keys - Binary Search
Exceeding Memory Limit in R (Even with 24Gb Ram)
Control Number of Decimal Places on Xtable Output in R
How to Remove Na from Facet_Wrap in Ggplot2
Working with Dictionaries/Lists to Get List of Keys
How to Drop Unused Levels from a Data Frame
Showing Different Axis Labels Using Ggplot2 with Facet_Wrap
Predicted Values for Logistic Regression from Glm and Stat_Smooth in Ggplot2 Are Different
Ggplot: Multiple Years on Same Plot by Month