select columns based on multiple strings with dplyr contains()
You can use matches
mtcars %>%
select(matches('m|ar')) %>%
head(2)
# mpg am gear carb
#Mazda RX4 21 1 4 4
#Mazda RX4 Wag 21 1 4 4
According to the ?select
documentation
‘matches(x, ignore.case = TRUE)’: selects all variables whose
name matches the regular expression ‘x’
Though contains
work with a single string
mtcars %>%
select(contains('m'))
Dplyr select based on multiple strings in a column
To select variables that contain a
and c
we could do:
library(dplyr)
df %>%
select(matches("(a.*c)|(c.*a)"))
a_b_c c_b_a
1 1 1
2 2 2
3 3 3
4 4 4
Note that var a_a_e
is not selected because it doesn't contain c
and var c_f_g
is not selected because it doesn't contain a
. Column names with two a
's and two c
's will not be selected either as seen with var a_a_e
.
We could also use str_subset
:
library(dplyr)
library(stringr)
df %>%
select(str_subset(names(df), "(a.*c)|(c.*a)"))
Data:
df <- data.frame(
a_b_c = 1:4,
a_a_e = 1:4,
c_f_g = 1:4,
c_b_a = 1:4
)
Select columns based on string match - dplyr::select
Within the dplyr world, try:
select(iris,contains("Sepal"))
See the Selection section in ?select
for numerous other helpers like starts_with
, ends_with
, etc.
how to choose columns based on specific names of the columns in a dataframe
You can use grep
/grepl
to match column names by a pattern. If your dataframe is called df
.
df[grepl('mean|std', names(df))]
Or in dplyr
you can use select
:
library(dplyr)
df %>% select(matches('mean|std'))
dplyr select column based on string match
You can construct the columns in the order that you want with outer
.
order1 <- c('start', 'middle', 'end')
order2 <- c('f', 'a')
cols <- c(t(outer(order1, order2, paste, sep = '_')))
cols
#[1] "start_f" "start_a" "middle_f" "middle_a" "end_f" "end_a"
data[cols]
# start_f start_a middle_f middle_a end_f end_a
#1 3 1 11 9 7 5
If not all combinations of order1
and order2
are present in the data we can use any_of
which will select only the columns present in data
without giving any error.
library(dplyr)
data %>% select(any_of(cols))
To select based on pattern in names.
order1 <- c('start', 'middle', 'end')
order2 <- c('f', 'a')
pattern <- c(t(outer(order1, order2, function(x, y) sprintf('^%s_%s.*', x, y))))
pattern
#[1] "^start_f.*" "^start_a.*" "^middle_f.*" "^middle_a.*" "^end_f.*" "^end_a.*"
cols <- names(data)
data[sapply(pattern, function(x) grep(x, cols))]
# start_f start_a middle_f middle_a end_f end_a
#1 3 1 11 9 7 5
Filtering multiple string columns based on 2 different criteria - questions about grepl and starts_with
We can use filter
with across
. where we loop over the columns using c_across
specifying the column name match in select_helpers (starts_with
), get a logical output with grepl
checking for either "C18" or (|
) the number that starts with (^
) 153
library(dplyr) #1.0.0
library(stringr)
df %>%
# // do a row wise grouping
rowwise() %>%
# // subset the columns that starts with 'DGN' within c_across
# // apply grepl condition on the subset
# // wrap with any for any column in a row meeting the condition
filter(any(grepl("C18|^153", c_across(starts_with("DGN")))))
Or with filter_at
df %>%
# //apply the any_vars along with grepl in filter_at
filter_at(vars(starts_with("DGN")), any_vars(grepl("C18|^153", .)))
data
df <- data.frame(ID = 1:3, DGN1 = c("2_C18", 32, "1532"),
DGN2 = c("24", "C18_2", "23"))
Subsetting strings from a column if they match multiple strings in a different column
We need a group by all
library(dplyr)
df1 %>%
group_by(species) %>%
filter(all(c('warmed', 'ambient') %in% state)) %>%
ungroup
-output
# A tibble: 4 x 2
# species state
# <chr> <chr>
#1 Rufl warmed
#2 Rufl ambient
#3 Assp warmed
#4 Assp ambient
The &
operation doesn't work as the elements are not present in the same location
Or using subset
subset(df1, species %in% names(which(rowSums(table(df1) > 0) == 2)))
Related Topics
Ggplot X-Axis Labels with All X-Axis Values
Possible to Create Latex Multicolumns in Xtable
How Achieve Identical Facet Sizes and Scales in Several Multi-Facet Ggplot2 Graphics
R: Replacing Na Values by Mean of Hour with Dplyr
Ggplot2: Reorder Bars from Highest to Lowest in Each Facet
Count Number of Records and Generate Row Number Within Each Group in a Data.Table
How to Install Roracle Package on Windows 7
How to Change Positions of X and Y Axis in Ggplot2
Ggplot: Colour Points by Groups Based on User Defined Colours
How to Strip Dollar Signs ($) from Data/ Escape Special Characters in R
Geom_Tile and Facet_Grid/Facet_Wrap for Same Height of Tiles