How to Remove Columns with Same Value in R

How to remove columns with same value in R

Just use vapply to go through and check how many unique values there are in each column:

Sample data:

mydf <- data.frame(v1 = 1:4, v2 = 5:8,
v3 = 2, v4 = 9:12, v5 = 1)
mydf
## v1 v2 v3 v4 v5
## 1 1 5 2 9 1
## 2 2 6 2 10 1
## 3 3 7 2 11 1
## 4 4 8 2 12 1

What we will be doing with vapply:

vapply(mydf, function(x) length(unique(x)) > 1, logical(1L))
# v1 v2 v3 v4 v5
# TRUE TRUE FALSE TRUE FALSE

Keep the columns you want:

mydf[vapply(mydf, function(x) length(unique(x)) > 1, logical(1L))]
# v1 v2 v4
# 1 1 5 9
# 2 2 6 10
# 3 3 7 11
# 4 4 8 12

How remove columns with same values (dplyr::select)?

You can use select where number of unique values is greater than 1.

library(dplyr)
df %>% select(where(~n_distinct(.) > 1))

# x y
#1 col s
#2 <NA> <NA>
#3 1 3

How to remove data frame column with a single value

Filter is a useful function here. I will filter only for those where there is more than 1 unique value.

i.e.

Filter(function(x)(length(unique(x))>1), df1)

## Item_Name D_1 D_3
## 1 test1 1 11
## 2 test2 0 3
## 3 test3 1 1

Remove columns with same value from a dataframe

To select columns with more than one value regardless of type:

uniquelength <- sapply(d,function(x) length(unique(x)))
d <- subset(d, select=uniquelength>1)

?

(Oops, Roman's question is right -- this could knock out your column 5 as well)

Maybe (edit: thanks to comments!)

isfac <- sapply(d,inherits,"factor")
d <- subset(d,select=!isfac | uniquelength>1)

or

d <- d[,!isfac | uniquelength>1]

Removing all the columns of the data frame that have same values across all the rows

dataf[sapply(dataf, function(x) length(unique(x))>1)]

R - remove column when the values are all the same

We can use Filter

Filter(var, df1)

Or

Filter(function(x) length(unique(x))==1, df1)

Removing columns from dataframe that have value greater than -1

You can also use keep() and discard() from purrr (which is in the tidyverse). You would use these in conjunction with any() and all().

My example uses mtcars, but this would translate to any dataset.

library(purrr)

# keep all columns with any value less than or equal to 10
mtcars %>%
keep(~ any(. <= 10))

# remove all columns with all values greater than 10
mtcars %>%
discard(~ all(. > 10))

You can make the function as advanced as you'd like. This will keep columns where a certain percentage of values meets a criteria.

# keep all columns where 90% of the values are less than or equal to 10
mtcars %>%
keep(~ (sum(. <= 10) / length(.)) > 0.9)

Remove columns that have only a unique value

You can use select(where()).

Suppose I have a data frame like this:

df <- data.frame(A = LETTERS[1:5], B = 1:5, C = 2)

df
#> A B C
#> 1 A 1 2
#> 2 B 2 2
#> 3 C 3 2
#> 4 D 4 2
#> 5 E 5 2

Then I can do:

df %>% select(where(~ n_distinct(.) > 1))

#> A B
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> 4 D 4
#> 5 E 5


Related Topics



Leave a reply



Submit