Select columns based on string match - dplyr::select
Within the dplyr world, try:
select(iris,contains("Sepal"))
See the Selection section in ?select
for numerous other helpers like starts_with
, ends_with
, etc.
dplyr select column based on string match
You can construct the columns in the order that you want with outer
.
order1 <- c('start', 'middle', 'end')
order2 <- c('f', 'a')
cols <- c(t(outer(order1, order2, paste, sep = '_')))
cols
#[1] "start_f" "start_a" "middle_f" "middle_a" "end_f" "end_a"
data[cols]
# start_f start_a middle_f middle_a end_f end_a
#1 3 1 11 9 7 5
If not all combinations of order1
and order2
are present in the data we can use any_of
which will select only the columns present in data
without giving any error.
library(dplyr)
data %>% select(any_of(cols))
To select based on pattern in names.
order1 <- c('start', 'middle', 'end')
order2 <- c('f', 'a')
pattern <- c(t(outer(order1, order2, function(x, y) sprintf('^%s_%s.*', x, y))))
pattern
#[1] "^start_f.*" "^start_a.*" "^middle_f.*" "^middle_a.*" "^end_f.*" "^end_a.*"
cols <- names(data)
data[sapply(pattern, function(x) grep(x, cols))]
# start_f start_a middle_f middle_a end_f end_a
#1 3 1 11 9 7 5
How to select columns in an R dataframe based on string matching
Base R :
df[colSums(sapply(df, grepl, pattern = 'No')) > 0]
# v1 v3
#1 1 Nothing
#2 8 4
#3 7 2
#4 No number 9
Using dplyr
:
library(dplyr)
df %>% select(where(~any(grepl('No', .))))
select columns based on multiple strings with dplyr contains()
You can use matches
mtcars %>%
select(matches('m|ar')) %>%
head(2)
# mpg am gear carb
#Mazda RX4 21 1 4 4
#Mazda RX4 Wag 21 1 4 4
According to the ?select
documentation
‘matches(x, ignore.case = TRUE)’: selects all variables whose
name matches the regular expression ‘x’
Though contains
work with a single string
mtcars %>%
select(contains('m'))
Select columns based on string values in column content
Solution:
> dat %>% select_if(~any(grepl("\\.png|\\.mp3|\\.mp4", .)))
# A tibble: 4 x 3
picture sound video
<chr> <chr> <chr>
1 cat.png meow.mp3 cat.mp4
2 dog.png woof.mp3 dog.mp4
3 NA NA NA
4 bird.png tjirp.mp3 tjirp.mp4
how to choose columns based on specific names of the columns in a dataframe
You can use grep
/grepl
to match column names by a pattern. If your dataframe is called df
.
df[grepl('mean|std', names(df))]
Or in dplyr
you can use select
:
library(dplyr)
df %>% select(matches('mean|std'))
selecting columns based on exact string
I would also use the matches
selection helper proposed by @Chris, but if you are interested in alternatives:
# dplyr
dplyr::select(df1, grep("_high|low", colnames(df1)))
# base R
df1[, grep("_high|low", colnames(df1))]
Both result in
x1_low_2020 x2_low_2030 x1_high_2020 x2_high_2030
1 1 1 1
How to select columns based on string using dplyr
We can use select_
iris %>%
select_(sepal_ln = paste0(wanted, ".Length"), paste0(wanted, ".Width"))
Also, there are wrappers within select
to do this more easily i.e. one_of
, contains
, matches
etc. to select the required columns from the data
iris %>%
select(setNames(one_of(paste0(wanted, c(".Length", ".Width"))),
c("sepal_ln", "sepal_wd"))) %>%
head(2)
# A tibble: 2 × 2
# sepal_ln sepal_wd
# <dbl> <dbl>
#1 5.1 3.5
#2 4.9 3.0
NOTE: It is not clear whether the select_
methods will get deprecated in the next dplyr
release (0.6.0
) or not.
Populate a column based on a pattern in another column
You can use str_detect()
to evaluate if a string contains a certain pattern and then using an ifelse
is straightforward:
library(dplyr)
tibble( A = c(
"E3Y12",
"E3Y45",
"E3Y56",
"c1234",
"c56534",
"c3456")) %>%
mutate(B = ifelse(stringr::str_detect(A, "E3Y"), "This one", "That one"))
Related Topics
Add Max Value to a New Column in R
How to Get Coefficients and Their Confidence Intervals in Mixed Effects Models
How to Call a Function Using the Character String of the Function Name in R
How to Plot a Hybrid Boxplot: Half Boxplot with Jitter Points on the Other Half
Formatting Reactive Data.Frames in Shiny
Fill Missing Combinations in a Dataframe
Code Chunk Font Size in Rmarkdown with Knitr and Latex
Generate Markdown Comments Within for Loop
Libstdc++.So.6: Version 'Glibcxx_3.4.26' Not Found on Linux
Listing Contents of an R Data File Without Loading
What Leads the First Element of a Printed List to Be Enclosed with Backticks in R V3.5.1
How to Change 'Maximum Upload Size Exceeded' Restriction in Shiny and Save User File Inputs
How to Insert an Image into the Navbar on a Shiny Navbarpage()
Conditional Coloring of Cells in Table
Deleting Columns from a Data.Frame Where Na Is More Than 15% of the Column Length
Showing String in Formula and Not as Variable in Lm Fit