Removal of constant columns in R
The problem here is that your column variance is equal to zero. You can check which column of a data frame is constant this way, for example :
df <- data.frame(x=1:5, y=rep(1,5))
df
# x y
# 1 1 1
# 2 2 1
# 3 3 1
# 4 4 1
# 5 5 1
# Supply names of columns that have 0 variance
names(df[, sapply(df, function(v) var(v, na.rm=TRUE)==0)])
# [1] "y"
So if you want to exclude these columns, you can use :
df[,sapply(df, function(v) var(v, na.rm=TRUE)!=0)]
EDIT : In fact it is simpler to use apply
instead. Something like this :
df[,apply(df, 2, var, na.rm=TRUE) != 0]
Remove constant columns with or without NAs
Because you have a data.table
, you may use uniqueN
and its na.rm
argument:
df[ , lapply(.SD, function(v) if(uniqueN(v, na.rm = TRUE) > 1) v)]
# x
# 1: 1
# 2: 2
# 3: 3
# 4: NA
# 5: 5
A base
alternative could be Filter(function(x) length(unique(x[!is.na(x)])) > 1, df)
How to add a step to remove a column with constant value?
There are two steps for you to consider:
step_zv()
will remove variables that all have the same value (zero variance)step_nzv()
will remove variables that almost all have the same value (highly sparse and unbalanced)
Remove columns from data set in R having constant non-missing values
Try
data[sapply(data, function(x) length(unique(na.omit(x)))) > 1]
# b
# 1 2
# 2 4
# 3 3
Find the names of constant columns in an R data.frame
We need to pass na.action
to take care of the NA
elements, otherwise, it would completely remove the whole row
names(Filter(all, aggregate(.~study.name, DATA, is_constant,
na.action = na.pass)[-1]))
#[1] "setting" "prof" "random"
R - Error using sapply to remove constant columns in matrix
If you need to keep your data in matrix format, then try this:
X[,apply(X,2,function(x) length(unique(x))!=1)]
Output:
dday0_10 dday10_30 dday30C prec prec_sq
1 143.30 561.24 25.97 4.74 22.47
2 152.37 633.21 30.05 14.32 205.10
3 138.74 529.73 17.14 4.39 19.29
4 149.87 621.18 37.70 5.10 25.96
5 103.21 319.53 9.70 5.46 29.80
6 130.98 476.16 15.90 4.87 23.74
7 151.21 620.08 24.95 7.21 52.04
8 103.34 279.21 -1.84 4.31 18.60
9 126.50 416.97 7.50 3.77 14.18
10 86.87 184.58 -9.95 4.32 18.66
R - remove column when the values are all the same
We can use Filter
Filter(var, df1)
Or
Filter(function(x) length(unique(x))==1, df1)
Delete columns in a database if the values within the columns are similar
Here is a way you can get rid of constant columns:
##### Removing constant columns
cat("\n## Removing the constants columns.\n")
for (f in names(My_data_frame)) {
if (length(unique(My_data_frame[[f]])) == 1) {
cat(f, "is constant in my data frame. We delete it.\n")
My_data_frame[[f]] <- NULL
}
}
And here is the same solution considering the new NA rule I saw :)
##### Removing constant columns (considering NA's)
cat("\n## Removing the constants columns .\n")
for (f in names(My_data_frame)) {
if length(unique(iris[[f]][!is.na(iris[[f]])])) == 1) {
cat(f, "is constant in my data frame. We delete it.\n")
My_data_frame[[f]] <- NULL
}
}
How to deal with a column with only one value?
This error comes from step_dummy()
because the variable X33
only has one factor "TRUE"
. The easiest way to deal with this in your problem is to use step_zv()
on the nominal predictors before step_dummy()
.
This would make your recipe look like
lr_recipe <-
recipe(afiliasi ~ ., data = school_train) %>%
step_date(date_est, date_ops) %>%
step_rm(date_est, date_ops) %>%
textrecipes::step_clean_levels(village) %>%
step_zv(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
Reprex showing what is happening:
library(recipes)
mtcars$fac1 <- "h"
mtcars$fac2 <- rep(c("a", "b"), length.out = nrow(mtcars))
recipe(mpg ~ ., data = mtcars) %>%
step_dummy(all_nominal_predictors()) %>%
prep()
#> Error in `bake()`:
#> ! Only one factor level in fac1: h
recipe(mpg ~ ., data = mtcars) %>%
step_zv(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
prep()
#> Recipe
#>
#> Inputs:
#>
#> role #variables
#> outcome 1
#> predictor 12
#>
#> Training data contained 32 data points and no missing data.
#>
#> Operations:
#>
#> Zero variance filter removed fac1 [trained]
#> Dummy variables from fac2 [trained]
Related Topics
How to Do Selective Labeling with Ggplot Geom_Point()
Plot Every Column in a Data Frame as a Histogram on One Page Using Ggplot
Convert Ggplot Object to Plotly in Shiny Application
Ggally::Ggpairs Plot Without Gridlines When Plotting Correlation Coefficient
Unnesting a List of Lists in a Data Frame Column
Stl Decomposition of Time Series with Missing Values for Anomaly Detection
Extract Text from Two-Column PDF with R
How to Display a Busy Indicator in a Shiny App
How to Append a Plot to an Existing PDF File
How to Prevent Exposure of My Password When Using Rgoogledocs
Understanding Color Scales in Ggplot2
Traceback() for Interactive and Non-Interactive R Sessions
Use of Switch() in R to Replace Vector Values
R Ggplot2 Add Today's Date to the Title
Convert String Back into Object in R