Scale in Data.Table in R

scale by group in data.table

The scale function output is a matrix, so convert it to a vector

dt[, c("score1", "score2") := lapply(.SD, function(x) as.vector(scale(x))), by = session]
dt
# session score1 score2
# 1: 1 -0.7433155 -0.6859943
# 2: 1 -1.0530303 -1.0289917
# 3: 1 -0.2787433 -0.3429970
# 4: 1 0.8052585 0.6859944
# 5: 1 1.2698307 1.3719886
# 6: 2 -0.7847341 -0.6824535
# 7: 2 -0.2942753 -0.3650335
# 8: 2 -0.9949307 -0.9205191
# 9: 2 0.7567078 0.4285175
#10: 2 1.3172322 1.5394886

To understand it better, try it on a simple vector

scale(1:10)
# [,1]
# [1,] -1.4863011
# [2,] -1.1560120
# [3,] -0.8257228
# [4,] -0.4954337
# [5,] -0.1651446
# [6,] 0.1651446
# [7,] 0.4954337
# [8,] 0.8257228
# [9,] 1.1560120
#[10,] 1.4863011

Standardize data columns in R

I have to assume you meant to say that you wanted a mean of 0 and a standard deviation of 1. If your data is in a dataframe and all the columns are numeric you can simply call the scale function on the data to do what you want.

dat <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5))
scaled.dat <- scale(dat)

# check that we get mean of 0 and sd of 1
colMeans(scaled.dat) # faster version of apply(scaled.dat, 2, mean)
apply(scaled.dat, 2, sd)

Using built in functions is classy. Like this cat:

Sample Image

selective scaling function in r using a different data frame to scale

One way with base R. Comments in the code. Thanks, Nelson, for the data +1

df <- read.table(text="color weight height length estimate
1 red 10 66 40 5
2 red 12 60 41 7
3 yellow 12 67 48 9
4 blue 15 55 36 10
5 yellow 21 54 48 7
6 red 12 54 43 5
7 red 11 38 36 6", head=T)

scale_df <- read.table(text=" color weight height length estimate
1 red 11 55 41 7
2 red 13 67 39 9
3 yellow 12 67 46 11
4 blue 16 8 37 5
5 yellow 23 10 47 9
6 red 17 11 41 10
7 red 16 13 37 13", head=T)

## add reference and scaling df as arguments
scale2sd <- function(ref, scale_by, variable) {
((ref[[variable]]) - mean(scale_by[[variable]], na.rm = TRUE)) / (2 * sd(scale_by[[variable]], na.rm = TRUE))
}
predictors <- c("color", "weight", "height", "length")
## this is to get all numeric columns that are part of your predictor variables
df_to_scale <- Filter(is.numeric, df[predictors])
## create a named vector. This is a bit awkward but it makes it easier to select
## the corresponding items in the two data frames,
## and then replace the original columns
num_vars <- setNames(names(df_to_scale), names(df_to_scale))

## this is the actual scaling job -
## use the named vector for looping over the selected columns
## then assign it back to the selected columns
df[num_vars] <- lapply(num_vars, function(x) scale2sd(df, scale_df, x))

df
#> color weight height length estimate
#> 1 red -0.67259271 0.58130793 -0.14222363 5
#> 2 red -0.42479540 0.47561558 -0.01777795 7
#> 3 yellow -0.42479540 0.59892332 0.85334176 9
#> 4 blue -0.05309942 0.38753862 -0.64000632 10
#> 5 yellow 0.69029252 0.36992323 0.85334176 7
#> 6 red -0.42479540 0.36992323 0.23111339 5
#> 7 red -0.54869405 0.08807696 -0.64000632 6

scaling a subset of columns of data.table in R

sx = cbind(x[,-(2:4)],data.table(scale(x[,2:4])))

I suspect, it would be better for your workflow to melt your data.table to long format.

How to scale segments of a column in an R data frame?

Apply the same function (scale) by group.

In base R

df$z <- with(df, ave(x, y, FUN = scale))
df

# x y z
#1 1 A -1.26491
#2 2 A -0.63246
#3 3 A 0.00000
#4 4 A 0.63246
#5 5 A 1.26491
#6 20 B -1.33242
#7 22 B -0.59219
#8 24 B 0.14805
#9 25 B 0.51816
#10 27 B 1.25840
#11 12 C -0.83028
#12 13 C -0.36901
#13 12 C -0.83028
#14 15 C 0.55352
#15 17 C 1.47605

Using dplyr

library(dplyr)
df %>% group_by(y) %>% mutate(z = scale(x))

Or data.table

library(data.table)
setDT(df)[, z:= scale(x), y]

R Normalize Many Columns

Try scale like below

cbind(data, `colnames<-`(scale(data[normalize_these]), paste0(normalize_these, "NEW")))

If you would like to use data.table, below might be an option

setDT(data)
data[, paste0(normalize_these, "NEW") := lapply(.SD, scale), .SDcols = normalize_these]


Related Topics



Leave a reply



Submit