How to remove columns with same value in R
Just use vapply
to go through and check how many unique values there are in each column:
Sample data:
mydf <- data.frame(v1 = 1:4, v2 = 5:8,
v3 = 2, v4 = 9:12, v5 = 1)
mydf
## v1 v2 v3 v4 v5
## 1 1 5 2 9 1
## 2 2 6 2 10 1
## 3 3 7 2 11 1
## 4 4 8 2 12 1
What we will be doing with vapply
:
vapply(mydf, function(x) length(unique(x)) > 1, logical(1L))
# v1 v2 v3 v4 v5
# TRUE TRUE FALSE TRUE FALSE
Keep the columns you want:
mydf[vapply(mydf, function(x) length(unique(x)) > 1, logical(1L))]
# v1 v2 v4
# 1 1 5 9
# 2 2 6 10
# 3 3 7 11
# 4 4 8 12
How remove columns with same values (dplyr::select)?
You can use select
where
number of unique values is greater than 1.
library(dplyr)
df %>% select(where(~n_distinct(.) > 1))
# x y
#1 col s
#2 <NA> <NA>
#3 1 3
How to remove data frame column with a single value
Filter
is a useful function here. I will filter only for those where there is more than 1 unique value.
i.e.
Filter(function(x)(length(unique(x))>1), df1)
## Item_Name D_1 D_3
## 1 test1 1 11
## 2 test2 0 3
## 3 test3 1 1
Remove columns with same value from a dataframe
To select columns with more than one value regardless of type:
uniquelength <- sapply(d,function(x) length(unique(x)))
d <- subset(d, select=uniquelength>1)
?
(Oops, Roman's question is right -- this could knock out your column 5 as well)
Maybe (edit: thanks to comments!)
isfac <- sapply(d,inherits,"factor")
d <- subset(d,select=!isfac | uniquelength>1)
or
d <- d[,!isfac | uniquelength>1]
Removing all the columns of the data frame that have same values across all the rows
dataf[sapply(dataf, function(x) length(unique(x))>1)]
R - remove column when the values are all the same
We can use Filter
Filter(var, df1)
Or
Filter(function(x) length(unique(x))==1, df1)
Removing columns from dataframe that have value greater than -1
You can also use keep()
and discard()
from purrr
(which is in the tidyverse). You would use these in conjunction with any()
and all()
.
My example uses mtcars
, but this would translate to any dataset.
library(purrr)
# keep all columns with any value less than or equal to 10
mtcars %>%
keep(~ any(. <= 10))
# remove all columns with all values greater than 10
mtcars %>%
discard(~ all(. > 10))
You can make the function as advanced as you'd like. This will keep columns where a certain percentage of values meets a criteria.
# keep all columns where 90% of the values are less than or equal to 10
mtcars %>%
keep(~ (sum(. <= 10) / length(.)) > 0.9)
Remove columns that have only a unique value
You can use select(where())
.
Suppose I have a data frame like this:
df <- data.frame(A = LETTERS[1:5], B = 1:5, C = 2)
df
#> A B C
#> 1 A 1 2
#> 2 B 2 2
#> 3 C 3 2
#> 4 D 4 2
#> 5 E 5 2
Then I can do:
df %>% select(where(~ n_distinct(.) > 1))
#> A B
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> 4 D 4
#> 5 E 5
Related Topics
How to Get a Warning on "Shiny App Will Not Work If the Same Output Is Used Twice"
Convert Comma Separated String to Integer in R
How to Remove Na from Facet_Wrap in Ggplot2
Geom_Col Is Assigning the Wrong Independent Variable
Use Fortran Subroutine in R? Undefined Symbol
R Creating a Sequence Table from Two Columns
Combining Pivoted Rows in R by Common Value
How to Find Index of Match Between Two Set of Data Frame
Display Y-Axis for Each Subplot When Faceting
How to Display Verbatim Inline R Code with Backticks Using Rmarkdown
How to Generate Ascii "Graphical Output" from R
How to Put Exact Number of Decimal Places on Label Ggplot Bar Chart
Function for Retrieving Own Ip Address from Within R
Using ':=' in Data.Table to Sum the Values of Two Columns in R, Ignoring Nas