Scale only certain columns R
We can do this with lapply
. Subset the columns of interest, loop through them with lapply
, assign the output back to the subset of data. Here, we are using c
because the outpuf of scale
is a matrix
with a single column. Using c
or as.vector
, it gets converted to vector
df[c(3,6)] <- lapply(df[c(3, 6), function(x) c(scale(x)))
Or another option is mutate_at
from dplyr
library(dplyr)
df %>%
mutate_at(c(3,6), funs(c(scale(.))))
Scaling data in R ignoring specific columns
you can do partial assignment:
trouble[, -c(1)] <- scale(trouble[, -c(1)])
selective scaling function in r using a different data frame to scale
One way with base R. Comments in the code. Thanks, Nelson, for the data +1
df <- read.table(text="color weight height length estimate
1 red 10 66 40 5
2 red 12 60 41 7
3 yellow 12 67 48 9
4 blue 15 55 36 10
5 yellow 21 54 48 7
6 red 12 54 43 5
7 red 11 38 36 6", head=T)
scale_df <- read.table(text=" color weight height length estimate
1 red 11 55 41 7
2 red 13 67 39 9
3 yellow 12 67 46 11
4 blue 16 8 37 5
5 yellow 23 10 47 9
6 red 17 11 41 10
7 red 16 13 37 13", head=T)
## add reference and scaling df as arguments
scale2sd <- function(ref, scale_by, variable) {
((ref[[variable]]) - mean(scale_by[[variable]], na.rm = TRUE)) / (2 * sd(scale_by[[variable]], na.rm = TRUE))
}
predictors <- c("color", "weight", "height", "length")
## this is to get all numeric columns that are part of your predictor variables
df_to_scale <- Filter(is.numeric, df[predictors])
## create a named vector. This is a bit awkward but it makes it easier to select
## the corresponding items in the two data frames,
## and then replace the original columns
num_vars <- setNames(names(df_to_scale), names(df_to_scale))
## this is the actual scaling job -
## use the named vector for looping over the selected columns
## then assign it back to the selected columns
df[num_vars] <- lapply(num_vars, function(x) scale2sd(df, scale_df, x))
df
#> color weight height length estimate
#> 1 red -0.67259271 0.58130793 -0.14222363 5
#> 2 red -0.42479540 0.47561558 -0.01777795 7
#> 3 yellow -0.42479540 0.59892332 0.85334176 9
#> 4 blue -0.05309942 0.38753862 -0.64000632 10
#> 5 yellow 0.69029252 0.36992323 0.85334176 7
#> 6 red -0.42479540 0.36992323 0.23111339 5
#> 7 red -0.54869405 0.08807696 -0.64000632 6
Standardize data columns in R
I have to assume you meant to say that you wanted a mean of 0 and a standard deviation of 1. If your data is in a dataframe and all the columns are numeric you can simply call the scale
function on the data to do what you want.
dat <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5))
scaled.dat <- scale(dat)
# check that we get mean of 0 and sd of 1
colMeans(scaled.dat) # faster version of apply(scaled.dat, 2, mean)
apply(scaled.dat, 2, sd)
Using built in functions is classy. Like this cat:
scale columns based on vector of column names
library(tidyverse)
set.seed(123)
dat <-
data.frame(year_ref = 2000:2004,
www_val1 = sample(5),
www_val2 = sample(5),
www_val3 = sample(5),
sat_val1 = sample(5),
sat_val2 = sample(5),
sat_val3 = sample(5),
ds_val1 = sample(5),
ds_val2 = sample(5),
ds_val3 = sample(5))
var_names <- c("ds", "sat")
dat %>%
dplyr::mutate_at(vars(starts_with(var_names)), ~scale(., center = T, scale = T))
# year_ref www_val1 www_val2 www_val3 sat_val1 sat_val2 sat_val3 ds_val1 ds_val2 ds_val3
# 1 2000 3 3 1 0.0000000 -0.6324555 -1.2649111 0.6324555 0.6324555 0.0000000
# 2 2001 5 5 3 -1.2649111 0.0000000 0.0000000 -0.6324555 -1.2649111 1.2649111
# 3 2002 2 2 2 0.6324555 0.6324555 0.6324555 0.0000000 1.2649111 -0.6324555
# 4 2003 4 4 5 -0.6324555 -1.2649111 -0.6324555 1.2649111 -0.6324555 -1.2649111
# 5 2004 1 1 4 1.2649111 1.2649111 1.2649111 -1.2649111 0.0000000 0.6324555
How to scale segments of a column in an R data frame?
Apply the same function (scale
) by group.
In base R
df$z <- with(df, ave(x, y, FUN = scale))
df
# x y z
#1 1 A -1.26491
#2 2 A -0.63246
#3 3 A 0.00000
#4 4 A 0.63246
#5 5 A 1.26491
#6 20 B -1.33242
#7 22 B -0.59219
#8 24 B 0.14805
#9 25 B 0.51816
#10 27 B 1.25840
#11 12 C -0.83028
#12 13 C -0.36901
#13 12 C -0.83028
#14 15 C 0.55352
#15 17 C 1.47605
Using dplyr
library(dplyr)
df %>% group_by(y) %>% mutate(z = scale(x))
Or data.table
library(data.table)
setDT(df)[, z:= scale(x), y]
Rescaled certain columns to specific mean and standard deviation in R
- Please try to post a valid reprex next time. This will save others the trouble of having to manually reproduce your input data. Also, it is not immediately clear how your first code chunk referring to a df with columns
v1
-v5
relates to the subsequent code chunk referring todf$mother.iq
. - The help file for
psych::rescale()
specifically states that the input,x
, should be a matrix or data frame. I suspect this is why the output you get is not what you were expecting. - While you can use
psych::rescale()
, a better alternative that offers more flexibility may be to forego the additional dependency on the{psych}
package altogether and, instead, simply manually rescale the columns as required. The two approaches are illustrated in the reprex below:
# load libraries
library(tidyverse)
# define data as per OP
df <- data.frame(
v1 = c(65L, 98L, 85L, 83L, 115L, 98L),
v2 = c(1L, 1L, 1L, 1L, 1L, 0L),
v3 = c(121.12, 89.36, 115.44, 99.45, 92.75, 107.9),
v4 = c(4L, 4L, 4L, 3L, 4L, 1L),
v5 = c(27L, 25L, 27L, 25L, 27L, 18L)
)
# rescale via psych::rescale using entire data frame
df %>% psych::rescale(mean = 100, sd = 15)
#> v1 v2 v3 v4 v5
#> 1 77.38682 106.12372 119.90143 108.25723 109.31746
#> 2 106.46091 106.12372 82.24089 108.25723 100.71673
#> 3 95.00748 106.12372 113.16617 108.25723 109.31746
#> 4 93.24541 106.12372 94.20546 95.87139 100.71673
#> 5 121.43847 106.12372 86.26070 108.25723 109.31746
#> 6 106.46091 69.38138 104.22535 71.09970 70.61416
# if you only want to do this for specific columns, do it manually by targeting
# columns using dplyr::mutate_at(), an anonymous function, and scale (from base
# R):
df %>%
mutate_at(vars(v4, v5), function(x) scale(x)*15 + 100)
#> v1 v2 v3 v4 v5
#> 1 65 1 121.12 108.25723 109.31746
#> 2 98 1 89.36 108.25723 100.71673
#> 3 85 1 115.44 108.25723 109.31746
#> 4 83 1 99.45 95.87139 100.71673
#> 5 115 1 92.75 108.25723 109.31746
#> 6 98 0 107.90 71.09970 70.61416
Related Topics
R Calculate the Average of One Column Corresponding to Each Bin of Another Column
How to Install the Odbc Driver for Snowflake Successfully on an M1 Apple Silicon MAC
Prevent Automatic Conversion of Single Column to Vector
Twitter Sentiment Analysis W R Using German Language Set Sentiws
Legend Venn Diagram in Venneuler
R Create Function to Add Water Year Column
R - Download Filtered Datatable
How to Count Sequences of Ones in a Logical Vector
Extract Time (Hms) from Lubridate Date Time Object
Programming with Ggplot2 and Dplyr
How to Download a Large Binary File with Rcurl *After* Server Authentication
Calculate Elapsed Time Since Last Event
Get(X) Does Not Work in R Data.Table When X Is Also a Column in the Data Table