Extracting Common Character Strings from Multiple Vectors of Different Lengths

Extracting common character strings from multiple vectors of different lengths

Yes:

Reduce(intersect,list(a1,a2,a3))

How to find common elements from multiple vectors?

There might be a cleverer way to go about this, but

intersect(intersect(a,b),c)

will do the job.

EDIT: More cleverly, and more conveniently if you have a lot of arguments:

Reduce(intersect, list(a,b,c))

Extracting a number of a string of varying lengths

We can use str_extract with pattern \\d+ which means to match one or more numbers. It can be otherwise written as [0-9]+.

library(stringr)
as.numeric(str_extract(testVector, "\\d+"))
#[1] 10 6 4 15

If there are multiple numbers in a string, we use str_extract_all which wil1 return a list output.


This can be also done with base R (no external packages used)

as.numeric(regmatches(testVector, regexpr("\\d+", testVector)))
#[1] 10 6 4 15

Or using gsub from base R

as.numeric(gsub("\\D+", "", testVector))
#[1] 10 6 4 15

BTW, some functions are just using the gsub, from extract_numeric

function (x) 
{
as.numeric(gsub("[^0-9.-]+", "", as.character(x)))
}

So, if we need a function, we can create one (without using any external packages)

ext_num <- function(x) {
as.numeric(gsub("\\D+", "", x))
}
ext_num(testVector)
#[1] 10 6 4 15

How can I extract matched part of multiple strings?

strsplit and intersect the overlapping parts recursively using Reduce. You can then piece it back together by paste-ing.

paste(Reduce(intersect, strsplit(data.dir, "\\\\")), collapse="\\")
#[1] "C:\\data\\files"

As @g-grothendieck notes, this will fail in certain circumstances like:

data.dir <- c("C:\\a\\b\\c\\", "C:\\a\\X\\c\\") 

An ugly hack might be something like:

tail(
Reduce(
intersect,
lapply(strsplit(data.dir, "\\\\"),
function(x) sapply(1:length(x), function(y) paste(x[1:y], collapse="\\") )
)
),
1)

...which will deal with either case.


Alternatively, use dirname if you only ever have one extra directory level:

unique(dirname(data.dir))
#[1] "C:/data/files"

Extract character from string based on character in another vector in R

I think str_sub only works with strings but for the second string strsplit gives you a vector of 2 strings.

This would do the job in the case the separator only appears once in every string:

sapply(strsplit(a,split=b, fixed=FALSE), function(l) str_sub(l[[1]],-1,-1))

Find common substrings between two character variables

Here's a CRAN package for that:

library(qualV)

sapply(seq_along(a), function(i)
paste(LCS(strsplit(a[i], '')[[1]], strsplit(b[i], '')[[1]])$LCS,
collapse = ""))

R intersecting strings

This works, but I'm not sure how robust it is given your question is a little vague.

Reduce(intersect, strsplit(basketball," "))
#[1] "MISS" "Pullup" "Jump" "Shot"

How do I select the longest string from each vector of strings in a list of vectors?

You can try:

lapply(lst, function(x) x[which.max(nchar(x))])

[[1]]
[1] "The quick brown fox"

[[2]]
[1] "And forever in peace may she wave"

how can I find same pattern among strings?

We can split the strings at each character and use intersect to get the common ones.

intersect(strsplit(a, "")[[1]], strsplit(b, "")[[1]])
#[1] "a" "b" "c"

To get the exact output as requested we can paste them together.

paste(intersect(strsplit(a, "")[[1]], strsplit(b, "")[[1]]), collapse = "")
#[1] "abc"

If there are multiple strings we can use Reduce (also see here):

a <- "abczzzzz"
b <- "rrrrabckkk"
c <- "dsaqwabc"

paste(Reduce(intersect, strsplit(c(a, b, c), "")), collapse = "")
#[1] "abc"


Related Topics



Leave a reply



Submit