Scale in Data.Table in R

scale by group in data.table

The scale function output is a matrix, so convert it to a vector

dt[, c("score1", "score2") := lapply(.SD, function(x) as.vector(scale(x))), by = session]
dt
#    session     score1     score2
# 1:       1 -0.7433155 -0.6859943
# 2:       1 -1.0530303 -1.0289917
# 3:       1 -0.2787433 -0.3429970
# 4:       1  0.8052585  0.6859944
# 5:       1  1.2698307  1.3719886
# 6:       2 -0.7847341 -0.6824535
# 7:       2 -0.2942753 -0.3650335
# 8:       2 -0.9949307 -0.9205191
# 9:       2  0.7567078  0.4285175
#10:       2  1.3172322  1.5394886

To understand it better, try it on a simple vector

scale(1:10)
#        [,1]
# [1,] -1.4863011
# [2,] -1.1560120
# [3,] -0.8257228
# [4,] -0.4954337
# [5,] -0.1651446
# [6,]  0.1651446
# [7,]  0.4954337
# [8,]  0.8257228
# [9,]  1.1560120
#[10,]  1.4863011

Standardize data columns in R

I have to assume you meant to say that you wanted a mean of 0 and a standard deviation of 1. If your data is in a dataframe and all the columns are numeric you can simply call the scale function on the data to do what you want.

dat <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5))
scaled.dat <- scale(dat)

# check that we get mean of 0 and sd of 1
colMeans(scaled.dat)  # faster version of apply(scaled.dat, 2, mean)
apply(scaled.dat, 2, sd)

Using built in functions is classy. Like this cat:

Sample Image

selective scaling function in r using a different data frame to scale

One way with base R. Comments in the code. Thanks, Nelson, for the data +1

df <- read.table(text="color weight height length estimate
    1    red     10     66     40        5
    2    red     12     60     41        7
    3 yellow     12     67     48        9
    4   blue     15     55     36       10
    5 yellow     21     54     48        7
    6    red     12     54     43        5
    7    red     11     38     36        6", head=T)

scale_df <- read.table(text=" color weight height length estimate
    1    red     11     55     41        7
    2    red     13     67     39        9
    3 yellow     12     67     46       11
    4   blue     16      8     37        5
    5 yellow     23     10     47        9
    6    red     17     11     41       10
    7    red     16     13     37       13", head=T)

## add reference and scaling df as arguments
scale2sd <- function(ref, scale_by, variable) {
  ((ref[[variable]]) - mean(scale_by[[variable]], na.rm = TRUE)) / (2 * sd(scale_by[[variable]], na.rm = TRUE))
}
predictors <- c("color", "weight", "height", "length")
## this is to get all numeric columns that are part of your predictor variables
df_to_scale <- Filter(is.numeric, df[predictors])
## create a named vector. This is a bit awkward but it makes it easier to select
## the corresponding items in the two data frames, 
## and then replace the original columns 
num_vars <- setNames(names(df_to_scale), names(df_to_scale))                      

## this is the actual scaling job - 
## use the named vector for looping over the selected columns 
## then assign it back to the selected columns
df[num_vars] <- lapply(num_vars, function(x) scale2sd(df, scale_df, x))

df
#>    color      weight     height      length estimate
#> 1    red -0.67259271 0.58130793 -0.14222363        5
#> 2    red -0.42479540 0.47561558 -0.01777795        7
#> 3 yellow -0.42479540 0.59892332  0.85334176        9
#> 4   blue -0.05309942 0.38753862 -0.64000632       10
#> 5 yellow  0.69029252 0.36992323  0.85334176        7
#> 6    red -0.42479540 0.36992323  0.23111339        5
#> 7    red -0.54869405 0.08807696 -0.64000632        6

scaling a subset of columns of data.table in R

sx = cbind(x[,-(2:4)],data.table(scale(x[,2:4])))

I suspect, it would be better for your workflow to melt your data.table to long format.

How to scale segments of a column in an R data frame?

Apply the same function (scale) by group.

In base R

df$z <- with(df, ave(x, y, FUN = scale))
df

#    x y        z
#1   1 A -1.26491
#2   2 A -0.63246
#3   3 A  0.00000
#4   4 A  0.63246
#5   5 A  1.26491
#6  20 B -1.33242
#7  22 B -0.59219
#8  24 B  0.14805
#9  25 B  0.51816
#10 27 B  1.25840
#11 12 C -0.83028
#12 13 C -0.36901
#13 12 C -0.83028
#14 15 C  0.55352
#15 17 C  1.47605

Using dplyr

library(dplyr)
df %>%  group_by(y) %>%  mutate(z =  scale(x))

Or data.table

library(data.table)
setDT(df)[, z:= scale(x), y]

R Normalize Many Columns

Try scale like below

cbind(data, `colnames<-`(scale(data[normalize_these]), paste0(normalize_these, "NEW")))

If you would like to use data.table, below might be an option

setDT(data)
data[, paste0(normalize_these, "NEW") := lapply(.SD, scale), .SDcols = normalize_these]