Removal of Constant Columns in R

Removal of constant columns in R

The problem here is that your column variance is equal to zero. You can check which column of a data frame is constant this way, for example :

df <- data.frame(x=1:5, y=rep(1,5))
df
# x y
# 1 1 1
# 2 2 1
# 3 3 1
# 4 4 1
# 5 5 1

# Supply names of columns that have 0 variance
names(df[, sapply(df, function(v) var(v, na.rm=TRUE)==0)])
# [1] "y"

So if you want to exclude these columns, you can use :

df[,sapply(df, function(v) var(v, na.rm=TRUE)!=0)]

EDIT : In fact it is simpler to use apply instead. Something like this :

df[,apply(df, 2, var, na.rm=TRUE) != 0]

Remove constant columns with or without NAs

Because you have a data.table, you may use uniqueN and its na.rm argument:

df[ , lapply(.SD, function(v) if(uniqueN(v, na.rm = TRUE) > 1) v)]
# x
# 1: 1
# 2: 2
# 3: 3
# 4: NA
# 5: 5

A base alternative could be Filter(function(x) length(unique(x[!is.na(x)])) > 1, df)

How to add a step to remove a column with constant value?

There are two steps for you to consider:

  • step_zv() will remove variables that all have the same value (zero variance)
  • step_nzv() will remove variables that almost all have the same value (highly sparse and unbalanced)

Remove columns from data set in R having constant non-missing values

Try

data[sapply(data, function(x) length(unique(na.omit(x)))) > 1]

# b
# 1 2
# 2 4
# 3 3

Find the names of constant columns in an R data.frame

We need to pass na.action to take care of the NA elements, otherwise, it would completely remove the whole row

names(Filter(all, aggregate(.~study.name, DATA, is_constant, 
na.action = na.pass)[-1]))
#[1] "setting" "prof" "random"

R - Error using sapply to remove constant columns in matrix

If you need to keep your data in matrix format, then try this:

X[,apply(X,2,function(x) length(unique(x))!=1)]

Output:

   dday0_10 dday10_30 dday30C  prec prec_sq
1 143.30 561.24 25.97 4.74 22.47
2 152.37 633.21 30.05 14.32 205.10
3 138.74 529.73 17.14 4.39 19.29
4 149.87 621.18 37.70 5.10 25.96
5 103.21 319.53 9.70 5.46 29.80
6 130.98 476.16 15.90 4.87 23.74
7 151.21 620.08 24.95 7.21 52.04
8 103.34 279.21 -1.84 4.31 18.60
9 126.50 416.97 7.50 3.77 14.18
10 86.87 184.58 -9.95 4.32 18.66

R - remove column when the values are all the same

We can use Filter

Filter(var, df1)

Or

Filter(function(x) length(unique(x))==1, df1)

Delete columns in a database if the values within the columns are similar

Here is a way you can get rid of constant columns:

##### Removing constant columns
cat("\n## Removing the constants columns.\n")
for (f in names(My_data_frame)) {
if (length(unique(My_data_frame[[f]])) == 1) {
cat(f, "is constant in my data frame. We delete it.\n")
My_data_frame[[f]] <- NULL
}
}

And here is the same solution considering the new NA rule I saw :)

##### Removing constant columns (considering NA's)
cat("\n## Removing the constants columns .\n")
for (f in names(My_data_frame)) {
if length(unique(iris[[f]][!is.na(iris[[f]])])) == 1) {
cat(f, "is constant in my data frame. We delete it.\n")
My_data_frame[[f]] <- NULL
}
}

How to deal with a column with only one value?

This error comes from step_dummy() because the variable X33 only has one factor "TRUE". The easiest way to deal with this in your problem is to use step_zv() on the nominal predictors before step_dummy().

This would make your recipe look like

lr_recipe <- 
recipe(afiliasi ~ ., data = school_train) %>%
step_date(date_est, date_ops) %>%
step_rm(date_est, date_ops) %>%
textrecipes::step_clean_levels(village) %>%
step_zv(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())

Reprex showing what is happening:

library(recipes)

mtcars$fac1 <- "h"
mtcars$fac2 <- rep(c("a", "b"), length.out = nrow(mtcars))

recipe(mpg ~ ., data = mtcars) %>%
step_dummy(all_nominal_predictors()) %>%
prep()
#> Error in `bake()`:
#> ! Only one factor level in fac1: h

recipe(mpg ~ ., data = mtcars) %>%
step_zv(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
prep()
#> Recipe
#>
#> Inputs:
#>
#> role #variables
#> outcome 1
#> predictor 12
#>
#> Training data contained 32 data points and no missing data.
#>
#> Operations:
#>
#> Zero variance filter removed fac1 [trained]
#> Dummy variables from fac2 [trained]


Related Topics



Leave a reply



Submit