Using Grep to Help Subset a Data Frame

Using grep to help subset a data frame

It's pretty straightforward using [ to extract:

grep will give you the position in which it matched your search pattern (unless you use value = TRUE).

grep("^G45", My.Data$x)
# [1] 2

Since you're searching within the values of a single column, that actually corresponds to the row index. So, use that with [ (where you would use My.Data[rows, cols] to get specific rows and columns).

My.Data[grep("^G45", My.Data$x), ]
# x y
# 2 G459 2

The help-page for subset shows how you can use grep and grepl with subset if you prefer using this function over [. Here's an example.

subset(My.Data, grepl("^G45", My.Data$x))
# x y
# 2 G459 2

As of R 3.3, there's now also the startsWith function, which you can again use with subset (or with any of the other approaches above). According to the help page for the function, it's considerably faster than using substring or grepl.

subset(My.Data, startsWith(as.character(x), "G45"))
# x y
# 2 G459 2

How to subset multiple columns from df including grep match

Assuming I understood what you would like to do, a possible solution that may not be useful and/or may be redundant:

my_selector <- function(df,partial_name,...){
positional_names <- match(...,names(df))
df[,c(positional_names,grep(partial_name,names(df)))]
}
my_selector(iris, partial_name = "Petal","Species")

A "simpler" option would be to use grep and the like to match the target names at once:

iris[grep("Spec.*|Peta.*", names(iris))]

Or even simpler, as suggested by @akrun , we can simply do:

iris[grep("(Spec|Peta).*", names(iris))]

For more columns, we could do something like:

my_selector(iris, partial_name = "Petal",c("Species","Sepal.Length"))
Species Sepal.Length Petal.Length Petal.Width
1 setosa 5.1 1.4 0.2
2 setosa 4.9 1.4 0.2

Note however that in the above function, the columns are selected counter-intuitively in that the names supplied last are selected first.

Result for the first part(truncated):

         Species Petal.Length Petal.Width
1 setosa 1.4 0.2
2 setosa 1.4 0.2
3 setosa 1.3 0.2
4 setosa 1.5 0.2
5 setosa 1.4 0.2
6 setosa 1.7 0.4
7 setosa 1.4 0.3

Using grep to subset columns of a dataframe by names(dataframe)

Try this:

sensordata.common <- sensordata[,c(grep("IAT|OAT|IAH",names(sensordata))), drop=F]
sensordata.common
IAT
1 72.5
names(sensordata.common)
[1] "IAT"

The option drop=F prevents [ to reduce the output to a vector. See ?[ (you need to use backticks around [, can't get it formatted here correctly...

Alternatively, you could use dplyr::select, as in select(sensordata.common, contains("your_names_here")). dplyr's default is to never change the output class.

How to use grep in Data frame in R


# Your dataframe
df = data.frame( Text = c("Mary had a little lamb and she is sweet."
,"Robin is a great superhero."
,"batman dark knight is wonderful movie."
,"Superman series has been a disappointment."))

# get the index which has batman in the text of your dataframe
df[grep("batman", df$Text),]

Outputs

[1] "batman dark knight is wonderful movie."

In dplyr with grepl (which returns not a number but a logical value)

library(dplyr)
df %>% filter(grepl("batman", Text))

Outputs

                                    Text
1 batman dark knight is wonderful movie.

how to select an exact match using grep in R to subset a dataframe

You can use \\b in your regexp to detect word boundaries.

For ex:

data <- data.frame(field=c(14,1144,"test14test","test 14 test"))
grep("\\b14\\b",data$field)
#[1] 1 4

If data$field are just numbers, @Pierre Lafortune's solution might be more appropriate.



Related Topics



Leave a reply



Submit