Using grep to help subset a data frame
It's pretty straightforward using [
to extract:
grep
will give you the position in which it matched your search pattern (unless you use value = TRUE
).
grep("^G45", My.Data$x)
# [1] 2
Since you're searching within the values of a single column, that actually corresponds to the row index. So, use that with [
(where you would use My.Data[rows, cols]
to get specific rows and columns).
My.Data[grep("^G45", My.Data$x), ]
# x y
# 2 G459 2
The help-page for subset
shows how you can use grep
and grepl
with subset
if you prefer using this function over [
. Here's an example.
subset(My.Data, grepl("^G45", My.Data$x))
# x y
# 2 G459 2
As of R 3.3, there's now also the startsWith
function, which you can again use with subset
(or with any of the other approaches above). According to the help page for the function, it's considerably faster than using substring
or grepl
.
subset(My.Data, startsWith(as.character(x), "G45"))
# x y
# 2 G459 2
How to subset multiple columns from df including grep match
Assuming I understood what you would like to do, a possible solution that may not be useful and/or may be redundant:
my_selector <- function(df,partial_name,...){
positional_names <- match(...,names(df))
df[,c(positional_names,grep(partial_name,names(df)))]
}
my_selector(iris, partial_name = "Petal","Species")
A "simpler" option would be to use grep
and the like to match the target names at once:
iris[grep("Spec.*|Peta.*", names(iris))]
Or even simpler, as suggested by @akrun , we can simply do:
iris[grep("(Spec|Peta).*", names(iris))]
For more columns, we could do something like:
my_selector(iris, partial_name = "Petal",c("Species","Sepal.Length"))
Species Sepal.Length Petal.Length Petal.Width
1 setosa 5.1 1.4 0.2
2 setosa 4.9 1.4 0.2
Note however that in the above function, the columns are selected counter-intuitively in that the names supplied last are selected first.
Result for the first part(truncated):
Species Petal.Length Petal.Width
1 setosa 1.4 0.2
2 setosa 1.4 0.2
3 setosa 1.3 0.2
4 setosa 1.5 0.2
5 setosa 1.4 0.2
6 setosa 1.7 0.4
7 setosa 1.4 0.3
Using grep to subset columns of a dataframe by names(dataframe)
Try this:
sensordata.common <- sensordata[,c(grep("IAT|OAT|IAH",names(sensordata))), drop=F]
sensordata.common
IAT
1 72.5
names(sensordata.common)
[1] "IAT"
The option drop=F
prevents [
to reduce the output to a vector. See ?[
(you need to use backticks around [
, can't get it formatted here correctly...
Alternatively, you could use dplyr::select
, as in select(sensordata.common, contains("your_names_here"))
. dplyr
's default is to never change the output class.
How to use grep in Data frame in R
# Your dataframe
df = data.frame( Text = c("Mary had a little lamb and she is sweet."
,"Robin is a great superhero."
,"batman dark knight is wonderful movie."
,"Superman series has been a disappointment."))
# get the index which has batman in the text of your dataframe
df[grep("batman", df$Text),]
Outputs
[1] "batman dark knight is wonderful movie."
In dplyr
with grepl
(which returns not a number but a logical value)
library(dplyr)
df %>% filter(grepl("batman", Text))
Outputs
Text
1 batman dark knight is wonderful movie.
how to select an exact match using grep in R to subset a dataframe
You can use \\b
in your regexp to detect word boundaries.
For ex:
data <- data.frame(field=c(14,1144,"test14test","test 14 test"))
grep("\\b14\\b",data$field)
#[1] 1 4
If data$field
are just numbers, @Pierre Lafortune's solution might be more appropriate.
Related Topics
Inserting a Table Under the Legend in a Ggplot2 Histogram
Set Default Cran Mirror Permanent in R
What Is the Significance of the New Reference Classes
Merging Two Columns into One in R
Check Whether Values in One Data Frame Column Exist in a Second Data Frame
Is There a Built-In Way to Do a Logarithmic Color Scale in Ggplot2
Conditionally Display a Block of Text in R Markdown
How to Tell Cran to Install Package Dependencies Automatically
Write List of Data.Frames to Separate CSV Files with Lapply
How to See Data from .Rdata File
Using Grep to Help Subset a Data Frame
Faster Weighted Sampling Without Replacement
R Knitr: Possible to Programmatically Modify Chunk Labels
Poly() in Lm(): Difference Between Raw VS. Orthogonal
R Function Not Returning Values
Linear Regression with a Known Fixed Intercept in R
How to Make Graphics with Transparent Background in R Using Ggplot2