Removal of Constant Columns in R

Removal of constant columns in R

The problem here is that your column variance is equal to zero. You can check which column of a data frame is constant this way, for example :

df <- data.frame(x=1:5, y=rep(1,5))
df
#   x y
# 1 1 1
# 2 2 1
# 3 3 1
# 4 4 1
# 5 5 1

# Supply names of columns that have 0 variance
names(df[, sapply(df, function(v) var(v, na.rm=TRUE)==0)])
# [1] "y"

So if you want to exclude these columns, you can use :

df[,sapply(df, function(v) var(v, na.rm=TRUE)!=0)]

EDIT : In fact it is simpler to use apply instead. Something like this :

df[,apply(df, 2, var, na.rm=TRUE) != 0]

Remove constant columns with or without NAs

Because you have a data.table, you may use uniqueN and its na.rm argument:

df[ , lapply(.SD, function(v) if(uniqueN(v, na.rm = TRUE) > 1) v)]
#     x
# 1:  1
# 2:  2
# 3:  3
# 4: NA
# 5:  5

A base alternative could be Filter(function(x) length(unique(x[!is.na(x)])) > 1, df)

How to add a step to remove a column with constant value?

There are two steps for you to consider:

step_zv() will remove variables that all have the same value (zero variance)
step_nzv() will remove variables that almost all have the same value (highly sparse and unbalanced)

Remove columns from data set in R having constant non-missing values

Try

data[sapply(data, function(x) length(unique(na.omit(x)))) > 1]

#   b
# 1 2
# 2 4
# 3 3

Find the names of constant columns in an R data.frame

We need to pass na.action to take care of the NA elements, otherwise, it would completely remove the whole row

names(Filter(all, aggregate(.~study.name, DATA, is_constant, 
            na.action = na.pass)[-1]))
#[1] "setting" "prof"    "random"

R - Error using sapply to remove constant columns in matrix

If you need to keep your data in matrix format, then try this:

X[,apply(X,2,function(x) length(unique(x))!=1)]

Output:

   dday0_10 dday10_30 dday30C  prec prec_sq
1    143.30    561.24   25.97  4.74   22.47
2    152.37    633.21   30.05 14.32  205.10
3    138.74    529.73   17.14  4.39   19.29
4    149.87    621.18   37.70  5.10   25.96
5    103.21    319.53    9.70  5.46   29.80
6    130.98    476.16   15.90  4.87   23.74
7    151.21    620.08   24.95  7.21   52.04
8    103.34    279.21   -1.84  4.31   18.60
9    126.50    416.97    7.50  3.77   14.18
10    86.87    184.58   -9.95  4.32   18.66

R - remove column when the values are all the same

We can use Filter

Filter(var, df1)

Filter(function(x) length(unique(x))==1, df1)

Delete columns in a database if the values within the columns are similar

Here is a way you can get rid of constant columns:

##### Removing constant columns
cat("\n## Removing the constants columns.\n")
for (f in names(My_data_frame)) {
  if (length(unique(My_data_frame[[f]])) == 1) {
    cat(f, "is constant in my data frame. We delete it.\n")
    My_data_frame[[f]] <- NULL
  }
}

And here is the same solution considering the new NA rule I saw :)

##### Removing constant columns (considering NA's)
cat("\n## Removing the constants columns .\n")
for (f in names(My_data_frame)) {
  if length(unique(iris[[f]][!is.na(iris[[f]])])) == 1) {
    cat(f, "is constant in my data frame. We delete it.\n")
    My_data_frame[[f]] <- NULL
  }
}

How to deal with a column with only one value?

This error comes from step_dummy() because the variable X33 only has one factor "TRUE". The easiest way to deal with this in your problem is to use step_zv() on the nominal predictors before step_dummy().

This would make your recipe look like

lr_recipe <- 
  recipe(afiliasi ~ ., data = school_train) %>%  
  step_date(date_est, date_ops) %>% 
  step_rm(date_est, date_ops) %>%
  textrecipes::step_clean_levels(village) %>%
  step_zv(all_nominal_predictors()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_zv(all_predictors()) %>% 
  step_normalize(all_predictors())

Reprex showing what is happening:

library(recipes)

mtcars$fac1 <- "h"
mtcars$fac2 <- rep(c("a", "b"), length.out = nrow(mtcars))

recipe(mpg ~ ., data = mtcars) %>%
  step_dummy(all_nominal_predictors()) %>%
  prep()
#> Error in `bake()`:
#> ! Only one factor level in fac1: h

recipe(mpg ~ ., data = mtcars) %>%
  step_zv(all_nominal_predictors()) %>%
  step_dummy(all_nominal_predictors()) %>%
  prep()
#> Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor         12
#> 
#> Training data contained 32 data points and no missing data.
#> 
#> Operations:
#> 
#> Zero variance filter removed fac1 [trained]
#> Dummy variables from fac2 [trained]

Removal of Constant Columns in R