How to Solve Prcomp.Default(): Cannot Rescale a Constant/Zero Column to Unit Variance

PCA and Constant-Zero Column Error

PCA only uses complete observations. In your second definition of df above, a PCA analysis will drop the last row due to missingness. And column c is constant within the remaining rows.

Note: my answer is around PCA generally and not specific to the caret package.

Removal of constant columns in R

The problem here is that your column variance is equal to zero. You can check which column of a data frame is constant this way, for example :

df <- data.frame(x=1:5, y=rep(1,5))
df
# x y
# 1 1 1
# 2 2 1
# 3 3 1
# 4 4 1
# 5 5 1

# Supply names of columns that have 0 variance
names(df[, sapply(df, function(v) var(v, na.rm=TRUE)==0)])
# [1] "y"

So if you want to exclude these columns, you can use :

df[,sapply(df, function(v) var(v, na.rm=TRUE)!=0)]

EDIT : In fact it is simpler to use apply instead. Something like this :

df[,apply(df, 2, var, na.rm=TRUE) != 0]

R command which(apply(data, 2, var)==0) in Python

Essentially, the command apply(data, 2, var) in R runs on two-dimensional structures such as matrices or data frames (but not advised for latter) to compute a variance of all columns:

Data frame

set.seed(73120)

random_df <- data.frame(
num1 = runif(500, 1, 100),
num2 = runif(500, 1, 100),
num3 = runif(500, 1, 100),
num4 = runif(500, 1, 100),
num5 = runif(500, 1, 100)
)

apply(random_df, 2, var)
# num1 num2 num3 num4 num5
# 822.9465 902.5558 782.4820 804.1448 830.1097

And once which is applied, the index of named vector (i.e., 1-D array) is returned according to logic.

which(apply(random_df, 2, var) > 900)
# num2
# 2

Matrix

set.seed(73120)

random_mat <- replicate(5, runif(500, 1, 100))

apply(random_mat, 2, var)
# [1] 822.9465 902.5558 782.4820 804.1448 830.1097

which(apply(random_mat, 2, var) > 900)
# [1] 2

Pandas

In Python, using pandas (data analytics library), the equivalent is also apply: DataFrame.apply with axis set to index to run operations on all columns. Equivalently, you can run DataFrame.aggregate. The return is a Pandas Series, similar to R's named vector as a 1-D array.

import numpy as np
import pandas as pd

np.random.seed(7312020)

random_df = pd.DataFrame({'num1': np.random.uniform(1, 100, 500),
'num2': np.random.uniform(1, 100, 500),
'num3': np.random.uniform(1, 100, 500),
'num4': np.random.uniform(1, 100, 500),
'num5': np.random.uniform(1, 100, 500)
})

agg1 = random_df.apply('var', axis='index')
print(agg1)
# num1 828.538378
# num2 810.755215
# num3 820.480400
# num4 811.728108
# num5 885.514924
# dtype: float64

agg2 = random_df.aggregate('var')
print(agg2)
# num1 828.538378
# num2 810.755215
# num3 820.480400
# num4 811.728108
# num5 885.514924
# dtype: float64

R's which can be achieved with simple bracketed [...] (also doable in R), .loc, or where (keeping original dimensions):

agg[agg > 850]
# num5 885.514924
# dtype: float64

agg.loc[agg > 850]
# num5 885.514924
# dtype: float64

agg.where(agg > 850)
# num1 NaN
# num2 NaN
# num3 NaN
# num4 NaN
# num5 885.514924
# dtype: float64

Numpy

Additionally using Python's numpy (the numeric computing library that supports arrays), you can use numpy.apply_along_axis. And to equate to Pandas' var, adjust default ddof accordingly:

random_arry = random_df.to_numpy()

agg = np.apply_along_axis(lambda x: np.var(x, ddof=1), 0, random_arry)
print(agg)
# [828.53837793 810.75521479 820.48039962 811.72810753 885.51492378]

print(agg[agg > 850])
# [885.51492378]

Principal Component Analysis throws constant/zero column Error

Sorry, I don't have the rep to comment, so posting as an answer, but after running your code, in particular this line:

 log10(training1[, -13]+1)    

returns NaN values in some columns (IL_1alpha and IL_3 actually):

 Warning messages:
1: In lapply(X = x, FUN = .Generic, ...) : NaNs produced

So that seems to be the source of the error. Maybe you shouldn't take log's of negative numbers and think of other transformation instead (or whether it is necessary at all)?



Related Topics



Leave a reply



Submit