Subset Data to Contain Only Columns Whose Names Match a Condition

Subset data to contain only columns whose names match a condition

Try grepl on the names of your data.frame. grepl matches a regular expression to a target and returns TRUE if a match is found and FALSE otherwise. The function is vectorised so you can pass a vector of strings to match and you will get a vector of boolean values returned.

Example

#  Data
df <- data.frame( ABC_1 = runif(3),
            ABC_2 = runif(3),
            XYZ_1 = runif(3),
            XYZ_2 = runif(3) )

#      ABC_1     ABC_2     XYZ_1     XYZ_2
#1 0.3792645 0.3614199 0.9793573 0.7139381
#2 0.1313246 0.9746691 0.7276705 0.0126057
#3 0.7282680 0.6518444 0.9531389 0.9673290

#  Use grepl
df[ , grepl( "ABC" , names( df ) ) ]
#      ABC_1     ABC_2
#1 0.3792645 0.3614199
#2 0.1313246 0.9746691
#3 0.7282680 0.6518444

#  grepl returns logical vector like this which is what we use to subset columns
grepl( "ABC" , names( df ) )
#[1]  TRUE  TRUE FALSE FALSE

To answer the second part, I'd make the subset data.frame and then make a vector that indexes the rows to keep (a logical vector) like this...

set.seed(1)
df <- data.frame( ABC_1 = sample(0:1,3,repl = TRUE),
            ABC_2 = sample(0:1,3,repl = TRUE),
            XYZ_1 = sample(0:1,3,repl = TRUE),
            XYZ_2 = sample(0:1,3,repl = TRUE) )

# We will want to discard the second row because 'all' ABC values are 0:
#  ABC_1 ABC_2 XYZ_1 XYZ_2
#1     0     1     1     0
#2     0     0     1     0
#3     1     1     1     0


df1 <- df[ , grepl( "ABC" , names( df ) ) ]

ind <- apply( df1 , 1 , function(x) any( x > 0 ) )

df1[ ind , ]
#  ABC_1 ABC_2
#1     0     1
#3     1     1

Subset data to contain only columns whose names match multiple condition using data.table

You can select multiple columns that match certain patterns in data.table using patterns in its .SDcols argument:

# turn df into data.table
setDT(df)

# select columns that contain ABD or XYZ
df[, .SD, .SDcols=patterns("ABC|XYZ")]

# or
df[, grep("ABC|XYZ", names(df)), with=FALSE]

subsetting rows and columns at the same time

cols = grep("ABC|XYZ",  names(df))

df[rowSums(df[, ..cols]>0)>0, .SD, .SDcols=cols]

select data that only columns whose names match a condition with a fixed column

Try this:

output$fault_template <- renderDataTable({
    fau <-  fau[, c(1, grep(input$su, names(fau))) ]
    datatable(fau[,-1:-1],class = 'cell-border stripe')

  })

By changing grepl to grep you'll get column indexes. Assuming that column A has index 1, then add it to the selection with c(1, ...

If column A has a column index that may change, try:

c(grep("A", names(fau)), grep(input$su, names(fau)))

input Pattern

If the input$su is a character like: "ASD GHG BVG JJJ" you need to convert it to a useful regex

Try changing:

grep(input$su, names(fau))

grep( gsub(" +", "|", input$su), names(fau))

This results in a pattern: "ASD|GHG|BVG|JJJ". I am assuming that each three letter group is a column name

Subset data based on partial match of column names

You mentioned you may be looking for symbols, so for this particular example we can use [[:punct:]] as our regular expression. This will find all the strings with punctuation symbols in the column names.

d <- data.frame(1:3, 3:1, 11:13, 13:11, rep(1, 3))
names(d) <- c("FullColName1", "FullColName2", "FullColName3",
              "PartString1()","PartString2()")

d[grepl("[[:punct:]]", names(d))]
#   PartString1() PartString2()
# 1            13             1
# 2            12             1
# 3            11             1

This last part just illustrates another way to do this with other string processing functions from stringr

library(stringr)
d[str_detect(names(d), "[[:punct:]]")]
#   PartString1() PartString2()
# 1            13             1
# 2            12             1
# 3            11             1

ADD per OPs comment

d[grepl("ring[12()]", names(d))]

to get either of the substrings ring1() or ring2() from the names vector

Subset data table by all unique entries in columns whose name contain a certain substring, fill with NA for other entries

We may get the unique elements and then replace the duplicated with NA

library(data.table)
dt[, lapply(.SD, unique), .SDcols = patterns("_ID$")][,
    lapply(.SD, \(x) replace(x, duplicated(x), NA))]

-output

    a_ID   b_ID
   <num> <char>
1:     1     XY
2:     2   <NA>
3:    11   <NA>

Or another option with unique

unique(dt[, .(a_ID, b_ID)])[, lapply(.SD, \(x) fcase(!duplicated(x), x))]
    a_ID   b_ID
   <num> <char>
1:     1     XY
2:     2   <NA>
3:    11   <NA>

Or another option is to block the code, check for the lengths after the unique step and append NA to fix the length

dt[, {lst1 <- lapply(.SD, unique)
     mx <- max(lengths(lst1))
    lapply(lst1, `length<-`, mx)}, .SDcols = patterns("_ID$")]
    a_ID   b_ID
   <num> <char>
1:     1     XY
2:     2   <NA>
3:    11   <NA>

We may also use collapse - select the columns (gvr), get the unique rows (funique), loop over the columns with dapply, replace the duplicates with NA

library(collapse)
dapply(funique(gvr(dt, "_ID$")), MARGIN = 2, 
   FUN = \(x) replace(x, duplicated(x), NA))
    a_ID   b_ID
   <num> <char>
1:     1     XY
2:     2   <NA>
3:    11   <NA>

Subset column names with specific string

To specify "ABC_" followed by a one or more digits (i.e. \\d+ or [0-9]+), you can use

df1 <- df[ , grepl("ABC_\\d+", names( df ), perl = TRUE ) ]
# df1 <- df[ , grepl("ABC_[0-9]+", names( df ), perl = TRUE ) ] # another option

To force the column names to start with "ABC_" you can add ^ to the regex to match only when "ABC_\d+" occurs at the start of the string as opposed to occurring anywhere within it.

df1 <- df[ , grepl("^ABC_\\d+", names( df ), perl = TRUE ) ]

If dplyr is more to your liking, you might try

library(dplyr)
select(df, matches("^ABC_\\d+"))

Is it possible to select columns in r based on any value in the column?

We can pass a function in select within where - check whether the column is numeric and if that is numeric, check whether there are any value equal to 9. In addition can change the any(.x ==9) to 9 %in% .x.

library(dplyr)
df %>% 
  select(where(~is.numeric(.x) && any(.x == 9)))

-output

 apple banana
1      1      9
2      4      9
3      6      4
4      8      8
5      9      1
6      9      3
7      2      6
8      4      7
9      7      5
10     4      9

Reduce columns that whose names matches a pattern

The reason is that the grep returns only the column names with value = TRUE, we need the value of the columns, Use .SD to subset the columns from the column names

library(data.table)
testing[,`:=` (
   "Total 1" = Reduce(`+`, .SD[, grep("number_1|number_2", names(.SD),
              value = TRUE), with = FALSE]),
    "Total 2" = Reduce(`+`, .SD[, grep("number_3|number_4", names(.SD), 
     value = TRUE), with = FALSE]))]

-output

> testing
                                                          first_column number_1 number_2 number_3 number_4 Total 1 Total 2
                                                                <char>    <num>    <num>    <int>    <int>   <num>   <int>
1:                                                               Alpha        1       11        2       12      12      14
2:                                                                Beta        2       12        3       13      14      16
3:                                                             Charlie        3       13        4       14      16      18
4:                                                               Tango        4       14        5       15      18      20
5:                                                   Alpha, Beta,Alpha        5       15        6       16      20      22
6:                                                  Alpha,Beta,Charlie        6       16        7       17      22      24
7:                                             Tango,Tango,Tango,Tango        7       17        8       18      24      26
8:                                            Tango,Tango,Tango, Tango        8       18        9       19      26      28
9: Tango,Tango,Tango, Tango , Alpha,Beta,Charlie, Alpha, Alpha ,Alpha         9       19       10       20      28      30

If there are multiple sets, we may also create a named list , Filter the list elements based on the occurence of names

lst_names <- list(c("number_1", "number_2"), 
                 c("number_3", "number_4"), 
                 c("number_5", "number_6"))
names(lst_names) <- paste("Total", seq_along(lst_names))
lst_names_sub <- Filter(length, lapply(lst_names, function(x)
        intersect(x, names(testing))))
testing[, names(lst_names_sub) := lapply(lst_names_sub, function(x) 
         Reduce(`+`, .SD[, x, with = FALSE]))]

Subset Columns based on partial matching of column names in the same data frame

You could try:

v <- unique(substr(names(eatable), 0, 5))
lapply(v, function(x) eatable[grepl(x, names(eatable))])

Or using map() + select_()

library(tidyverse)
map(v, ~select_(eatable, ~matches(.)))

Which gives:

#[[1]]
#  fruits_area fruits_production
#1          12               100
#2          33               250
#3         660               510
#
#[[2]]
#  vegetables_area vegetable_production
#1              26                  324
#2              40                  580
#3              43                  581

Should you want to make it into a function:

checkExpression <- function(df, l = 5) {
  v <- unique(substr(names(df), 0, l))
  lapply(v, function(x) df[grepl(x, names(df))])
}

Then simply use:

checkExpression(eatable, 5)

Subset Data to Contain Only Columns Whose Names Match a Condition