How do I select variables in an R dataframe whose names contain a particular string?
If you just want the variable names:
grep("^[Bb]", names(df), value=TRUE)
grep("3", names(df), value=TRUE)
If you are wanting to select those columns, then either
df[,grep("^[Bb]", names(df), value=TRUE)]
df[,grep("^[Bb]", names(df))]
The first uses selecting by name, the second uses selecting by a set of column numbers.
Subset data to contain only columns whose names match a condition
Try grepl
on the names of your data.frame
. grepl
matches a regular expression to a target and returns TRUE
if a match is found and FALSE
otherwise. The function is vectorised so you can pass a vector of strings to match and you will get a vector of boolean values returned.
Example
# Data
df <- data.frame( ABC_1 = runif(3),
ABC_2 = runif(3),
XYZ_1 = runif(3),
XYZ_2 = runif(3) )
# ABC_1 ABC_2 XYZ_1 XYZ_2
#1 0.3792645 0.3614199 0.9793573 0.7139381
#2 0.1313246 0.9746691 0.7276705 0.0126057
#3 0.7282680 0.6518444 0.9531389 0.9673290
# Use grepl
df[ , grepl( "ABC" , names( df ) ) ]
# ABC_1 ABC_2
#1 0.3792645 0.3614199
#2 0.1313246 0.9746691
#3 0.7282680 0.6518444
# grepl returns logical vector like this which is what we use to subset columns
grepl( "ABC" , names( df ) )
#[1] TRUE TRUE FALSE FALSE
To answer the second part, I'd make the subset data.frame and then make a vector that indexes the rows to keep (a logical vector) like this...
set.seed(1)
df <- data.frame( ABC_1 = sample(0:1,3,repl = TRUE),
ABC_2 = sample(0:1,3,repl = TRUE),
XYZ_1 = sample(0:1,3,repl = TRUE),
XYZ_2 = sample(0:1,3,repl = TRUE) )
# We will want to discard the second row because 'all' ABC values are 0:
# ABC_1 ABC_2 XYZ_1 XYZ_2
#1 0 1 1 0
#2 0 0 1 0
#3 1 1 1 0
df1 <- df[ , grepl( "ABC" , names( df ) ) ]
ind <- apply( df1 , 1 , function(x) any( x > 0 ) )
df1[ ind , ]
# ABC_1 ABC_2
#1 0 1
#3 1 1
Extract Variable Names whose Values contain a specific string (R)
If every value in Mark1
and Mark2
contains a %
we can check only the first row:
colnames(df)[grepl('%', df[1,])]
[1] "Mark1" "Mark2"
Otherwise, you can use apply
with MARGIN = 2
to apply this function to each column and return a named logical vector:
apply(df, 2, function(x) any(grepl('%', x)))
Name Mark1 Mark2 Mark3
FALSE TRUE TRUE FALSE
If you just want the variable names, use this logical vector to subset colnames(df)
:
colnames(df)[apply(df, 2, function(x) any(grepl('%', x)))]
[1] "Mark1" "Mark2"
Using a string from a list to select a column in R
Try this:
list<-list("Var1", "Var2", "Var3")
df1 <- data.frame("Var1" = 1:2, "Var2" = c(21,15), "Var3" = c(10,9))
df2<- data.frame("Var1" = 1, "Var2" = 16, "Var3" = 8)
#Sum
df1$Var4<- df1[,list[[1]]]+df2[,list[[1]]]
Var1 Var2 Var3 Var4
1 1 21 10 2
2 2 15 9 3
Search dataframe for columns with values that contains certain string and output new dataframe
I actually used the idea you had and just used a pivot, or I suppose gather()
from tidyr. I have three steps, first step is I converted any factor columns to character (At least for me it will throw out a warning otherwise). My second step was to gather all columns except PATIENT_ID and EVENT_NAME. Then the third step is to filter to only the rows that have pdf or jpg in it. I'm not sure if this is precisely what you need but it might work:
library(tidyr)
library(dplyr)
mydata%>%
mutate_if(is.factor, as.character)%>%
gather("var_name", "file_name", -PATIENT_ID,-EVENT_NAME)%>%
filter(grepl("pdf|jpg", file_name))
Best of luck to you, I hope this helps!
Select columns based on string match - dplyr::select
Within the dplyr world, try:
select(iris,contains("Sepal"))
See the Selection section in ?select
for numerous other helpers like starts_with
, ends_with
, etc.
Related Topics
How to Delete Rows Where All the Columns Are Zero
Change the Class from Factor to Numeric of Many Columns in a Data Frame
R Markdown - Changing Font Size and Font Type in HTML Output
Error: Could Not Find Function ... in R
Does Ifelse Really Calculate Both of Its Vectors Every Time? Is It Slow
Dictionary Style Replace Multiple Items
Difference Between '%In%' and '=='
Split String Column to Create New Binary Columns
Create Stacked Barplot Where Each Stack Is Scaled to Sum to 100%
Conditionally Replace Values of Subset of Rows With Column Name in R Using Only Tidy
Multiplying All Columns in Dataframe by Single Column
How to Italicize Part (One or Two Words) of an Axis Title
How to Sort a Character Vector Where Elements Contain Letters and Numbers