Select Columns Based on String Match - Dplyr::Select

Select columns based on string match - dplyr::select

Within the dplyr world, try:

select(iris,contains("Sepal"))

See the Selection section in ?select for numerous other helpers like starts_with, ends_with, etc.

dplyr select column based on string match

You can construct the columns in the order that you want with outer.

order1 <- c('start', 'middle', 'end')
order2 <- c('f', 'a')
cols <- c(t(outer(order1, order2, paste, sep = '_')))
cols
#[1] "start_f" "start_a" "middle_f" "middle_a" "end_f" "end_a"

data[cols]
# start_f start_a middle_f middle_a end_f end_a
#1 3 1 11 9 7 5

If not all combinations of order1 and order2 are present in the data we can use any_of which will select only the columns present in data without giving any error.

library(dplyr)
data %>% select(any_of(cols))

To select based on pattern in names.

order1 <- c('start', 'middle', 'end')
order2 <- c('f', 'a')
pattern <- c(t(outer(order1, order2, function(x, y) sprintf('^%s_%s.*', x, y))))
pattern
#[1] "^start_f.*" "^start_a.*" "^middle_f.*" "^middle_a.*" "^end_f.*" "^end_a.*"
cols <- names(data)

data[sapply(pattern, function(x) grep(x, cols))]

# start_f start_a middle_f middle_a end_f end_a
#1 3 1 11 9 7 5

How to select columns in an R dataframe based on string matching

Base R :

df[colSums(sapply(df, grepl, pattern = 'No')) > 0]

# v1 v3
#1 1 Nothing
#2 8 4
#3 7 2
#4 No number 9

Using dplyr :

library(dplyr)
df %>% select(where(~any(grepl('No', .))))

select columns based on multiple strings with dplyr contains()

You can use matches

 mtcars %>%
select(matches('m|ar')) %>%
head(2)
# mpg am gear carb
#Mazda RX4 21 1 4 4
#Mazda RX4 Wag 21 1 4 4

According to the ?select documentation

‘matches(x, ignore.case = TRUE)’: selects all variables whose
name matches the regular expression ‘x’

Though contains work with a single string

mtcars %>% 
select(contains('m'))

Select columns based on string values in column content

Solution:

> dat %>% select_if(~any(grepl("\\.png|\\.mp3|\\.mp4", .)))
# A tibble: 4 x 3
picture sound video
<chr> <chr> <chr>
1 cat.png meow.mp3 cat.mp4
2 dog.png woof.mp3 dog.mp4
3 NA NA NA
4 bird.png tjirp.mp3 tjirp.mp4

how to choose columns based on specific names of the columns in a dataframe

You can use grep/grepl to match column names by a pattern. If your dataframe is called df.

df[grepl('mean|std', names(df))]

Or in dplyr you can use select :

library(dplyr)
df %>% select(matches('mean|std'))

selecting columns based on exact string

I would also use the matches selection helper proposed by @Chris, but if you are interested in alternatives:

# dplyr
dplyr::select(df1, grep("_high|low", colnames(df1)))

# base R
df1[, grep("_high|low", colnames(df1))]

Both result in

 x1_low_2020 x2_low_2030 x1_high_2020 x2_high_2030
1 1 1 1

How to select columns based on string using dplyr

We can use select_

 iris %>% 
select_(sepal_ln = paste0(wanted, ".Length"), paste0(wanted, ".Width"))

Also, there are wrappers within select to do this more easily i.e. one_of, contains, matches etc. to select the required columns from the data

iris %>% 
select(setNames(one_of(paste0(wanted, c(".Length", ".Width"))),
c("sepal_ln", "sepal_wd"))) %>%
head(2)
# A tibble: 2 × 2
# sepal_ln sepal_wd
# <dbl> <dbl>
#1 5.1 3.5
#2 4.9 3.0

NOTE: It is not clear whether the select_ methods will get deprecated in the next dplyr release (0.6.0) or not.

Populate a column based on a pattern in another column

You can use str_detect() to evaluate if a string contains a certain pattern and then using an ifelse is straightforward:

library(dplyr)
tibble( A = c(
"E3Y12",
"E3Y45",
"E3Y56",
"c1234",
"c56534",
"c3456")) %>%
mutate(B = ifelse(stringr::str_detect(A, "E3Y"), "This one", "That one"))


Related Topics



Leave a reply



Submit