Select Columns Based on Multiple Strings with Dplyr Contains()

select columns based on multiple strings with dplyr contains()

You can use matches

 mtcars %>%
        select(matches('m|ar')) %>%
        head(2)
 #              mpg am gear carb
 #Mazda RX4      21  1    4    4
 #Mazda RX4 Wag  21  1    4    4

According to the ?select documentation

‘matches(x, ignore.case = TRUE)’: selects all variables whose
name matches the regular expression ‘x’

Though contains work with a single string

mtcars %>% 
       select(contains('m'))

Dplyr select based on multiple strings in a column

To select variables that contain a and c we could do:

library(dplyr)

df %>% 
  select(matches("(a.*c)|(c.*a)"))

  a_b_c c_b_a
1     1     1
2     2     2
3     3     3
4     4     4

Note that var a_a_e is not selected because it doesn't contain c and var c_f_g is not selected because it doesn't contain a. Column names with two a's and two c's will not be selected either as seen with var a_a_e.

We could also use str_subset:

library(dplyr)
library(stringr)

df %>% 
  select(str_subset(names(df), "(a.*c)|(c.*a)"))

Data:

df <- data.frame(
  a_b_c = 1:4,
  a_a_e = 1:4,
  c_f_g = 1:4,
  c_b_a = 1:4
)

Select columns based on string match - dplyr::select

Within the dplyr world, try:

select(iris,contains("Sepal"))

See the Selection section in ?select for numerous other helpers like starts_with, ends_with, etc.

how to choose columns based on specific names of the columns in a dataframe

You can use grep/grepl to match column names by a pattern. If your dataframe is called df.

df[grepl('mean|std', names(df))]

Or in dplyr you can use select :

library(dplyr)
df %>% select(matches('mean|std'))

dplyr select column based on string match

You can construct the columns in the order that you want with outer.

order1 <- c('start', 'middle', 'end')
order2 <- c('f', 'a')
cols <- c(t(outer(order1, order2, paste, sep = '_')))
cols
#[1] "start_f"  "start_a"  "middle_f" "middle_a" "end_f"    "end_a" 

data[cols]
#  start_f start_a middle_f middle_a end_f end_a
#1       3       1       11        9     7     5

If not all combinations of order1 and order2 are present in the data we can use any_of which will select only the columns present in data without giving any error.

library(dplyr)
data %>% select(any_of(cols))

To select based on pattern in names.

order1 <- c('start', 'middle', 'end')
order2 <- c('f', 'a')
pattern <- c(t(outer(order1, order2, function(x, y) sprintf('^%s_%s.*', x, y))))
pattern
#[1] "^start_f.*"  "^start_a.*"  "^middle_f.*" "^middle_a.*" "^end_f.*" "^end_a.*" 
cols <- names(data)

data[sapply(pattern, function(x) grep(x, cols))]

#  start_f start_a middle_f middle_a end_f end_a
#1       3       1       11        9     7     5

Filtering multiple string columns based on 2 different criteria - questions about grepl and starts_with

We can use filter with across. where we loop over the columns using c_across specifying the column name match in select_helpers (starts_with), get a logical output with grepl checking for either "C18" or (|) the number that starts with (^) 153

library(dplyr) #1.0.0
library(stringr)
df %>%
    # // do a row wise grouping
    rowwise() %>%
    # // subset the columns that starts with 'DGN' within c_across
    # // apply grepl condition on the subset
    # // wrap with any for any column in a row meeting the condition
    filter(any(grepl("C18|^153", c_across(starts_with("DGN")))))

Or with filter_at

df %>% 
  # //apply the any_vars along with grepl in filter_at
  filter_at(vars(starts_with("DGN")), any_vars(grepl("C18|^153", .)))

data

df <-  data.frame(ID = 1:3, DGN1 = c("2_C18", 32, "1532"), 
          DGN2 = c("24", "C18_2", "23"))

Subsetting strings from a column if they match multiple strings in a different column

We need a group by all

library(dplyr)
df1 %>%
   group_by(species) %>% 
   filter(all(c('warmed', 'ambient') %in% state)) %>%
   ungroup

-output

# A tibble: 4 x 2
#  species state  
#  <chr>   <chr>  
#1 Rufl    warmed 
#2 Rufl    ambient
#3 Assp    warmed 
#4 Assp    ambient

The & operation doesn't work as the elements are not present in the same location

Or using subset

subset(df1, species %in% names(which(rowSums(table(df1) > 0) == 2)))

Select Columns Based on Multiple Strings with Dplyr Contains()