Extracting common character strings from multiple vectors of different lengths
Yes:
Reduce(intersect,list(a1,a2,a3))
How to find common elements from multiple vectors?
There might be a cleverer way to go about this, but
intersect(intersect(a,b),c)
will do the job.
EDIT: More cleverly, and more conveniently if you have a lot of arguments:
Reduce(intersect, list(a,b,c))
Extracting a number of a string of varying lengths
We can use str_extract
with pattern \\d+
which means to match one or more numbers. It can be otherwise written as [0-9]+
.
library(stringr)
as.numeric(str_extract(testVector, "\\d+"))
#[1] 10 6 4 15
If there are multiple numbers in a string, we use str_extract_all
which wil1 return a list
output.
This can be also done with base R
(no external packages used)
as.numeric(regmatches(testVector, regexpr("\\d+", testVector)))
#[1] 10 6 4 15
Or using gsub
from base R
as.numeric(gsub("\\D+", "", testVector))
#[1] 10 6 4 15
BTW, some functions are just using the gsub
, from extract_numeric
function (x)
{
as.numeric(gsub("[^0-9.-]+", "", as.character(x)))
}
So, if we need a function, we can create one (without using any external packages)
ext_num <- function(x) {
as.numeric(gsub("\\D+", "", x))
}
ext_num(testVector)
#[1] 10 6 4 15
How can I extract matched part of multiple strings?
strsplit
and intersect
the overlapping parts recursively using Reduce
. You can then piece it back together by paste
-ing.
paste(Reduce(intersect, strsplit(data.dir, "\\\\")), collapse="\\")
#[1] "C:\\data\\files"
As @g-grothendieck notes, this will fail in certain circumstances like:
data.dir <- c("C:\\a\\b\\c\\", "C:\\a\\X\\c\\")
An ugly hack might be something like:
tail(
Reduce(
intersect,
lapply(strsplit(data.dir, "\\\\"),
function(x) sapply(1:length(x), function(y) paste(x[1:y], collapse="\\") )
)
),
1)
...which will deal with either case.
Alternatively, use dirname
if you only ever have one extra directory level:
unique(dirname(data.dir))
#[1] "C:/data/files"
Extract character from string based on character in another vector in R
I think str_sub
only works with strings but for the second string strsplit
gives you a vector of 2 strings.
This would do the job in the case the separator only appears once in every string:
sapply(strsplit(a,split=b, fixed=FALSE), function(l) str_sub(l[[1]],-1,-1))
Find common substrings between two character variables
Here's a CRAN package for that:
library(qualV)
sapply(seq_along(a), function(i)
paste(LCS(strsplit(a[i], '')[[1]], strsplit(b[i], '')[[1]])$LCS,
collapse = ""))
R intersecting strings
This works, but I'm not sure how robust it is given your question is a little vague.
Reduce(intersect, strsplit(basketball," "))
#[1] "MISS" "Pullup" "Jump" "Shot"
How do I select the longest string from each vector of strings in a list of vectors?
You can try:
lapply(lst, function(x) x[which.max(nchar(x))])
[[1]]
[1] "The quick brown fox"
[[2]]
[1] "And forever in peace may she wave"
how can I find same pattern among strings?
We can split the strings at each character and use intersect
to get the common ones.
intersect(strsplit(a, "")[[1]], strsplit(b, "")[[1]])
#[1] "a" "b" "c"
To get the exact output as requested we can paste
them together.
paste(intersect(strsplit(a, "")[[1]], strsplit(b, "")[[1]]), collapse = "")
#[1] "abc"
If there are multiple strings we can use Reduce
(also see here):
a <- "abczzzzz"
b <- "rrrrabckkk"
c <- "dsaqwabc"
paste(Reduce(intersect, strsplit(c(a, b, c), "")), collapse = "")
#[1] "abc"
Related Topics
Twitter Emoji Encoding Problems with Twitter and R
How to Add Overlapping Histograms with Lattice
Ggplot with Customized Font Not Showing Properly on Shinyapps.Io
Binning Data, Finding Results by Group, and Plotting Using R
How to Create a Plot with Customized Points in R
R: How to Find What S3 Method Will Be Called on an Object
Data Difference in 'As.Posixct' with Excel
Identify Consecutive Sequences Based on a Given Variable
Str_Extract_All: Return All Patterns Found in String Concatenated as Vector
R Data.Table Conditional Aggregation
Fastest Way to Remove All Duplicates in R
Add a Vector to All Rows of a Matrix
Substitute a for B and B for a in a String
Several Substitutions in One Line R
How to Add Random 'Na's into a Data Frame