Remove Columns of Dataframe Based on Conditions in R

Remove columns of dataframe based on conditions in R

I feel like this is all over-complicated. Condition 2 already includes all the rest of the conditions, as if there are at least two non-NA values in a column, obviously the whole column aren't NAs. And if there are at least two consecutive values in a column, then obviously this column contains more than one value. So instead of 3 conditions, this all sums up into a single condition (I prefer not to run many functions per column, rather after running diff per column- vecotrize the whole thing):

cond <- colSums(is.na(sapply(df, diff))) < nrow(df) - 1

This works because if there are no consecutive values in a column, the whole column will become NAs.

Then, just

df[, cond, drop = FALSE]
#        A     E
# 1  0.018    NA
# 2  0.017    NA
# 3  0.019    NA
# 4  0.018    NA
# 5  0.018    NA
# 6  0.015 0.037
# 7  0.016 0.031
# 8  0.019 0.025
# 9  0.016 0.035
# 10 0.018 0.035
# 11 0.017 0.043
# 12 0.023 0.040
# 13 0.022 0.042

Per your edit, it seems like you have a data.table object and you also have a Date column so the code would need some modifications.

cond <- df[, lapply(.SD, function(x) sum(is.na(diff(x)))) < .N - 1, .SDcols = -1] 
df[, c(TRUE, cond), with = FALSE]

Some explanations:

We want to ignore the first column in our calculations so we specify .SDcols = -1 when operating on our .SD (which means Sub Data in data.tableis)
.N is just the rows count (similar to nrow(df)
Next step is to subset by condition. We need not forget to grab the first column too so we start with c(TRUE,...
Finally, data.table works with non standard evaluation by default, hence, if you want to select column as if you would in a data.frame you will need to specify with = FALSE

A better way though, would be just to remove the column by reference using := NULL

cond <- c(FALSE, df[, lapply(.SD, function(x) sum(is.na(diff(x)))) == .N - 1, .SDcols = -1])
df[, which(cond) := NULL]

R: delete columns from data.frame if condition fulfilled

That should be quite easily accomplished with the following command:

df[colMeans(df)==1] <- NULL

Removing columns from a data.table in R based on conditions

dt = data.table("col1" = "a", "col2" = "b", "col3" = "c", 
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)

not0 = function(x) is.numeric(x) && !anyNA(x) && all(x!=0)
dt[, .(
    ## your categorical columns
    col1, col2, col3, col4, col5,
    ## new column pasted from non-0 numeric columns
    new = as.numeric(paste0(unlist(.SD), collapse=""))
  ),
  ## this filters columns to be provided in .SD column subset
  .SDcols = not0,
  ## we group by each row so it will handle input of multiple rows
  by = .(row=seq_len(nrow(dt)))
  ][, row:=NULL ## this removes extra grouping column
    ][] ## this prints
#   col1 col2 col3 col4 col5  new
#1:    a    b    c    d    e 9799

Alternatively if you want to update in place existing table

is0 = function(x) is.numeric(x) && !anyNA(x) && all(x==0)
## remove columns that has 0
dt[, which(sapply(dt, is0)) := NULL]

## add new column
dt[, new := as.numeric(
    paste0(unlist(.SD), collapse="")
  ), .SDcols=is.numeric, by=.(row=seq_len(nrow(dt)))
  ][]
#   col1 col2 col3 col4 col5 col6 col8 col10  new
#1:    a    b    c    d    e    9    7    99 9799

Remove a column in dataframe if a particular value meets a condition in R

You probably want an if clause.

df1 <- if (df1[nrow(df1), 2] < 4) {
  df1[, -2, drop=FALSE]
} else {
  df1
}
df1
#   V1
# 1  1
# 2  2

Using column names:

n <- 'V2'
df1 <- if (df1[nrow(df1), n] < 4) {
  df1[, setdiff(names(df1), n), drop=FALSE]
} else {
  df1
}
df1
#   V1
# 1  1
# 2  2

Drop multiple columns based on a condition

We can use colSums and keep column which has at least 2 values greater than 0. We use [-1] here to ignore Date column and check the greater than 0 condition for remaining columns.

cbind(df[1], df[-1][colSums(df[-1] > 0) >= 2])

#      Date  Item2 Item3
#1 10/10/12      1     1
#2 10/11/12      5     2
#3 10/12/12      3     0
#4 10/13/12      2     0
#5 10/14/12      2     0

Item1 and Item4 columns are removed since both of them have only one observation greater than 0.

Another option is select_if from dplyr using the same logic

library(dplyr)
bind_cols(df[1], df[-1] %>% select_if(funs(sum(. > 0) >= 2)))

Remove multiple columns and replace values of columns of dataframe based on condition in R

Here's a similar approach (perhaps more vectorized?)

is.na(df[-1]) <- df[-1] < 1 # Convert all values < 1 to NAs.
df[colSums(is.na(df)) != nrow(df)] # Select only the columns that have values.
#         Date  A  C
# 1 01/01/2000 NA NA
# 2 02/01/2000 NA NA
# 3 03/01/2000 NA NA
# 4 04/01/2000 NA NA
# 5 05/01/2000  5 NA
# 6 06/01/2000  6  1
# 7 07/01/2000  7  1
# 8 08/01/2000  8 NA
# 9 09/01/2000  9 NA

Or alternatively, second step could be

df[c(TRUE, colSums(df[-1], na.rm = TRUE) > 0)]
## OR 
## df[c(TRUE, sapply(df[-1], sum, na.rm = TRUE) > 0)] # as already sugggested

Remove Columns of Dataframe Based on Conditions in R