scale by group in data.table
The scale
function output is a matrix
, so convert it to a vector
dt[, c("score1", "score2") := lapply(.SD, function(x) as.vector(scale(x))), by = session]
dt
# session score1 score2
# 1: 1 -0.7433155 -0.6859943
# 2: 1 -1.0530303 -1.0289917
# 3: 1 -0.2787433 -0.3429970
# 4: 1 0.8052585 0.6859944
# 5: 1 1.2698307 1.3719886
# 6: 2 -0.7847341 -0.6824535
# 7: 2 -0.2942753 -0.3650335
# 8: 2 -0.9949307 -0.9205191
# 9: 2 0.7567078 0.4285175
#10: 2 1.3172322 1.5394886
To understand it better, try it on a simple vector
scale(1:10)
# [,1]
# [1,] -1.4863011
# [2,] -1.1560120
# [3,] -0.8257228
# [4,] -0.4954337
# [5,] -0.1651446
# [6,] 0.1651446
# [7,] 0.4954337
# [8,] 0.8257228
# [9,] 1.1560120
#[10,] 1.4863011
scale values within group in R
You could apply scale
function by group :
This can be done in base R:
df$y2 <- with(df, ave(y, x, FUN = scale))
df
# x y y2
#1 1 1 -0.707107
#2 1 3 0.707107
#3 2 4 0.707107
#4 2 3 -0.707107
#5 3 5 1.091089
#6 3 2 -0.872872
#7 3 3 -0.218218
dplyr
library(dplyr)
df %>% group_by(x) %>% mutate(y2 = scale(y))
and in data.table
:
library(data.table)
setDT(df)[, y2 := scale(y), x]
data
df <- data.frame(x=c(1,1,2,2,3,3,3),y=c(1,3,4,3,5,2,3))
Standardize by group using data.table
After grouping by 'gr' and 'grr', loop over the Subset of Data.table (.SD
), scale
it (the output of scale
is a matrix
, so we convert it to vector
with as.vector
) and assign (:=
) the output to the new columns.
DT[, paste0(names(DT)[1:2], ".z") := lapply(.SD,
function(x) as.vector(scale(x))), .(gr, grr)]
How to scale segments of a column in an R data frame?
Apply the same function (scale
) by group.
In base R
df$z <- with(df, ave(x, y, FUN = scale))
df
# x y z
#1 1 A -1.26491
#2 2 A -0.63246
#3 3 A 0.00000
#4 4 A 0.63246
#5 5 A 1.26491
#6 20 B -1.33242
#7 22 B -0.59219
#8 24 B 0.14805
#9 25 B 0.51816
#10 27 B 1.25840
#11 12 C -0.83028
#12 13 C -0.36901
#13 12 C -0.83028
#14 15 C 0.55352
#15 17 C 1.47605
Using dplyr
library(dplyr)
df %>% group_by(y) %>% mutate(z = scale(x))
Or data.table
library(data.table)
setDT(df)[, z:= scale(x), y]
how to scale a matrix by group?
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(my.df)
, grouped by 'sex', selecting the columns of interest in .SDcols
, we loop through the columns (lapply(.SD, ...
) , do the scale
and convert to vector
. (The scale
function output a matrix with some attributes, which will create some problems if we don't convert to vector
.)
library(data.table)
setDT(my.df)[, c('x', 'y', 'z') := lapply(.SD, function(x)
as.vector(scale(x))) , by = sex, .SDcols= x:z]
Data.table: Apply function over groups with reference to set value in each group. Pass resulting columns into a function
An option is to subset the 'statistic' by creating a logical condition based on 'variable' with 'base_variable' element grouped by 'geography'
results[, .(variable, diff = statistic - statistic[variable == base_variable]),
by = geography][variable != base_variable]
# geography variable diff
# 1: 1 bravo 0.8100971
# 2: 1 charlie -0.2091748
# 3: 1 delta 2.2217346
# 4: 2 bravo -1.1499762
# 5: 2 charlie 0.1579213
# 6: 2 delta 0.4088169
# 7: 3 bravo -0.8811697
# 8: 3 charlie 0.9359998
# 9: 3 delta -0.1859381
#10: 4 bravo -1.5934593
#11: 4 charlie 1.7461715
#12: 4 delta 0.5763070
Trying to use dplyr to group_by and apply scale()
The problem seems to be in the base scale()
function, which expects a matrix. Try writing your own.
scale_this <- function(x){
(x - mean(x, na.rm=TRUE)) / sd(x, na.rm=TRUE)
}
Then this works:
library("dplyr")
# reproducible sample data
set.seed(123)
n = 1000
df <- data.frame(stud_ID = sample(LETTERS, size=n, replace=TRUE),
behavioral_scale = runif(n, 0, 10),
cognitive_scale = runif(n, 1, 20),
affective_scale = runif(n, 0, 1) )
scaled_data <-
df %>%
group_by(stud_ID) %>%
mutate(behavioral_scale_ind = scale_this(behavioral_scale),
cognitive_scale_ind = scale_this(cognitive_scale),
affective_scale_ind = scale_this(affective_scale))
Or, if you're open to a data.table
solution:
library("data.table")
setDT(df)
cols_to_scale <- c("behavioral_scale","cognitive_scale","affective_scale")
df[, lapply(.SD, scale_this), .SDcols = cols_to_scale, keyby = factor(stud_ID)]
Related Topics
How to Add Rows with 0 Counts to Summarised Output
Ggplot2: Adding Lines in a Loop and Retaining Colour Mappings
How to Pass Multiple Group_By Arguments and a Dynamic Variable Argument to a Dplyr Function
R Shiny - Checkboxes and Action Button Combination Issue
How to Create a Dropdown List in a Shiny Table Using Datatable When Editing the Table
Non-Equi-Joins in R with Data.Table - Backticked Column Name Trouble
Reshape Data for Values in One Column
Rvest Not Recognizing CSS Selector
Http Error 400 on Google_Elevation() Call
Drop Columns That Take Less Than N Values
R: Ggplot2 Setting the Last Plot in the Midle with Facet_Wrap
Axis Does Not Plot with Date Labels
Efficient Way to Fill Time-Series Per Group
R Packages Fail to Compile with Gcc
R Function That Uses Its Output as Its Own Input Repeatedly
How to Manage Parallel Processing with Animated Ggplot2-Plot