Scale Only Certain Columns R

Scale only certain columns R

We can do this with lapply. Subset the columns of interest, loop through them with lapply, assign the output back to the subset of data. Here, we are using c because the outpuf of scale is a matrix with a single column. Using c or as.vector, it gets converted to vector

df[c(3,6)] <- lapply(df[c(3, 6), function(x) c(scale(x)))

Or another option is mutate_at from dplyr

library(dplyr)
df %>%
mutate_at(c(3,6), funs(c(scale(.))))

Scaling data in R ignoring specific columns

you can do partial assignment:

trouble[, -c(1)] <- scale(trouble[, -c(1)])

selective scaling function in r using a different data frame to scale

One way with base R. Comments in the code. Thanks, Nelson, for the data +1

df <- read.table(text="color weight height length estimate
1 red 10 66 40 5
2 red 12 60 41 7
3 yellow 12 67 48 9
4 blue 15 55 36 10
5 yellow 21 54 48 7
6 red 12 54 43 5
7 red 11 38 36 6", head=T)

scale_df <- read.table(text=" color weight height length estimate
1 red 11 55 41 7
2 red 13 67 39 9
3 yellow 12 67 46 11
4 blue 16 8 37 5
5 yellow 23 10 47 9
6 red 17 11 41 10
7 red 16 13 37 13", head=T)

## add reference and scaling df as arguments
scale2sd <- function(ref, scale_by, variable) {
((ref[[variable]]) - mean(scale_by[[variable]], na.rm = TRUE)) / (2 * sd(scale_by[[variable]], na.rm = TRUE))
}
predictors <- c("color", "weight", "height", "length")
## this is to get all numeric columns that are part of your predictor variables
df_to_scale <- Filter(is.numeric, df[predictors])
## create a named vector. This is a bit awkward but it makes it easier to select
## the corresponding items in the two data frames,
## and then replace the original columns
num_vars <- setNames(names(df_to_scale), names(df_to_scale))

## this is the actual scaling job -
## use the named vector for looping over the selected columns
## then assign it back to the selected columns
df[num_vars] <- lapply(num_vars, function(x) scale2sd(df, scale_df, x))

df
#> color weight height length estimate
#> 1 red -0.67259271 0.58130793 -0.14222363 5
#> 2 red -0.42479540 0.47561558 -0.01777795 7
#> 3 yellow -0.42479540 0.59892332 0.85334176 9
#> 4 blue -0.05309942 0.38753862 -0.64000632 10
#> 5 yellow 0.69029252 0.36992323 0.85334176 7
#> 6 red -0.42479540 0.36992323 0.23111339 5
#> 7 red -0.54869405 0.08807696 -0.64000632 6

Standardize data columns in R

I have to assume you meant to say that you wanted a mean of 0 and a standard deviation of 1. If your data is in a dataframe and all the columns are numeric you can simply call the scale function on the data to do what you want.

dat <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5))
scaled.dat <- scale(dat)

# check that we get mean of 0 and sd of 1
colMeans(scaled.dat) # faster version of apply(scaled.dat, 2, mean)
apply(scaled.dat, 2, sd)

Using built in functions is classy. Like this cat:

Sample Image

scale columns based on vector of column names

library(tidyverse)
set.seed(123)
dat <-
data.frame(year_ref = 2000:2004,
www_val1 = sample(5),
www_val2 = sample(5),
www_val3 = sample(5),
sat_val1 = sample(5),
sat_val2 = sample(5),
sat_val3 = sample(5),
ds_val1 = sample(5),
ds_val2 = sample(5),
ds_val3 = sample(5))
var_names <- c("ds", "sat")
dat %>%
dplyr::mutate_at(vars(starts_with(var_names)), ~scale(., center = T, scale = T))
# year_ref www_val1 www_val2 www_val3 sat_val1 sat_val2 sat_val3 ds_val1 ds_val2 ds_val3
# 1 2000 3 3 1 0.0000000 -0.6324555 -1.2649111 0.6324555 0.6324555 0.0000000
# 2 2001 5 5 3 -1.2649111 0.0000000 0.0000000 -0.6324555 -1.2649111 1.2649111
# 3 2002 2 2 2 0.6324555 0.6324555 0.6324555 0.0000000 1.2649111 -0.6324555
# 4 2003 4 4 5 -0.6324555 -1.2649111 -0.6324555 1.2649111 -0.6324555 -1.2649111
# 5 2004 1 1 4 1.2649111 1.2649111 1.2649111 -1.2649111 0.0000000 0.6324555

How to scale segments of a column in an R data frame?

Apply the same function (scale) by group.

In base R

df$z <- with(df, ave(x, y, FUN = scale))
df

# x y z
#1 1 A -1.26491
#2 2 A -0.63246
#3 3 A 0.00000
#4 4 A 0.63246
#5 5 A 1.26491
#6 20 B -1.33242
#7 22 B -0.59219
#8 24 B 0.14805
#9 25 B 0.51816
#10 27 B 1.25840
#11 12 C -0.83028
#12 13 C -0.36901
#13 12 C -0.83028
#14 15 C 0.55352
#15 17 C 1.47605

Using dplyr

library(dplyr)
df %>% group_by(y) %>% mutate(z = scale(x))

Or data.table

library(data.table)
setDT(df)[, z:= scale(x), y]

Rescaled certain columns to specific mean and standard deviation in R

  1. Please try to post a valid reprex next time. This will save others the trouble of having to manually reproduce your input data. Also, it is not immediately clear how your first code chunk referring to a df with columns v1 - v5 relates to the subsequent code chunk referring to df$mother.iq.
  2. The help file for psych::rescale() specifically states that the input, x, should be a matrix or data frame. I suspect this is why the output you get is not what you were expecting.
  3. While you can use psych::rescale(), a better alternative that offers more flexibility may be to forego the additional dependency on the {psych} package altogether and, instead, simply manually rescale the columns as required. The two approaches are illustrated in the reprex below:
# load libraries
library(tidyverse)

# define data as per OP
df <- data.frame(
v1 = c(65L, 98L, 85L, 83L, 115L, 98L),
v2 = c(1L, 1L, 1L, 1L, 1L, 0L),
v3 = c(121.12, 89.36, 115.44, 99.45, 92.75, 107.9),
v4 = c(4L, 4L, 4L, 3L, 4L, 1L),
v5 = c(27L, 25L, 27L, 25L, 27L, 18L)
)

# rescale via psych::rescale using entire data frame
df %>% psych::rescale(mean = 100, sd = 15)
#> v1 v2 v3 v4 v5
#> 1 77.38682 106.12372 119.90143 108.25723 109.31746
#> 2 106.46091 106.12372 82.24089 108.25723 100.71673
#> 3 95.00748 106.12372 113.16617 108.25723 109.31746
#> 4 93.24541 106.12372 94.20546 95.87139 100.71673
#> 5 121.43847 106.12372 86.26070 108.25723 109.31746
#> 6 106.46091 69.38138 104.22535 71.09970 70.61416

# if you only want to do this for specific columns, do it manually by targeting
# columns using dplyr::mutate_at(), an anonymous function, and scale (from base
# R):
df %>%
mutate_at(vars(v4, v5), function(x) scale(x)*15 + 100)
#> v1 v2 v3 v4 v5
#> 1 65 1 121.12 108.25723 109.31746
#> 2 98 1 89.36 108.25723 100.71673
#> 3 85 1 115.44 108.25723 109.31746
#> 4 83 1 99.45 95.87139 100.71673
#> 5 115 1 92.75 108.25723 109.31746
#> 6 98 0 107.90 71.09970 70.61416


Related Topics



Leave a reply



Submit