How to select R data.table rows based on substring match (a la SQL like)
data.table
has a like
function.
Months[like(Name,"mb")]
Name Number
1: September 9
2: November 11
3: December 12
Or, %like%
looks nicer :
> Months[Name %like% "mb"]
Name Number
1: September 9
2: November 11
3: December 12
Note that %like%
and like()
use grepl
(returns logical vector) rather than grep
(returns integer locations). That's so it can be combined with other logical conditions :
> Months[Number<12 & Name %like% "mb"]
Name Number
1: September 9
2: November 11
and you get the power of regular expression search (not just % or * wildcard), too.
Selecting rows in data.table on the basis of a substring match to any of multiple columns
We can specify the columns to compare in .SDcol
, loop through it with lapply
, convert it to logical using %like%
, check whether there is at least one TRUE per each row using Reduce
, use that to subset the elements from 'DetailCol1'.
the_dt[the_dt[, Reduce(`|`, lapply(.SD, `%like%`, "ARP")),
.SDcols= DataCol1:DataCol3], DetailCol1]
R data.table select rows based on partial string match from character vector
I have a solution in mind using lapply
and tstrsplit
. There's probably more elegant but it does the job
lapply(1:nrow(dt), function(i) {
dt[i,'match' := any(trimws(tstrsplit(as.character(dt[i,'sha']),";")) %in% pselection)]
})
dt[(match)]
title sha match
1: First title 12345 TRUE
2: Second Title 2345; 66543; 33423 TRUE
3: Third Title 22222; 12345678; TRUE
The idea is to split every row of sha
column (trim whitespace otherwise row 3 will not match) and check if any sha
appears
Subset a data.table by a vector of substrings
We can use grep
by paste
ing the vector
into a single string by collapse
ing with |
.
X[grep(paste(Vec, collapse="|"), H)]
Or we can use the same approach by paste
ing the pattern
vector collapse
d by |
(as suggested by @Tensibal)
X[like(H, pattern = paste(Vec, collapse="|"))]
Using grep to subset rows from a data.table, comparing row content
If you're happy using the stringi
package, this is a way that takes advantage of the fact that the stringi
functions vectorise both pattern and string:
DT[stri_detect_fixed(num, y), x := num])
Depending on the data, it may be faster than the method posted by Veerenda Gadekar.
DT <- data.table(num=paste0(sample(1000), sample(2001:2010, 1000, TRUE)),
y=as.character(sample(2001:2010, 1000, TRUE)))
microbenchmark(
vg = DT[, x := grep(y, num, value=TRUE, fixed=TRUE), by = .(num, y)],
nk = DT[stri_detect_fixed(num, y), x := num]
)
#Unit: microseconds
# expr min lq mean median uq max neval
# vg 6027.674 6176.397 6513.860 6278.689 6370.789 9590.398 100
# nk 975.260 1007.591 1116.594 1047.334 1110.734 3833.051 100
Related Topics
Remove Duplicates Based on 2Nd Column Condition
Split a Vector by Its Sequences
Activate Tabpanel from Another Tabpanel
Warning Message: Line Appears to Contain Embedded Nulls
Keyed Lookup on Data.Table Without 'With'
Compute the Minimum of a Pair of Vectors
Operations on Multiple Tables/Datasets with Edit Queries and R in Power Bi
Plot Background Colour in Gradient
Assign a Value, If a Number Is in Between Two Numbers
Identify Duplicates and Mark First Occurrence and All Others
What Type of Graph Is This? and Can It Be Created Using Ggplot2
How to Use Cast or Another Function to Create a Binary Table in R
Calculate Euclidean Distance Matrix Using a Big.Matrix Object
Time Series Plot Gets Offset by 2 Hours If Scale_X_Datetime Is Used
Reproduce Table and Plot from Journal