Extract All Strings Between Two Strings

Find string between two substrings

import re

s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))

Extracting a string between other two strings in R

You may use str_match with STR1 (.*?) STR2 (note the spaces are "meaningful", if you want to just match anything in between STR1 and STR2 use STR1(.*?)STR2, or use STR1\\s*(.*?)\\s*STR2 to trim the value you need). If you have multiple occurrences, use str_match_all.

Also, if you need to match strings that span across line breaks/newlines add (?s) at the start of the pattern: (?s)STR1(.*?)STR2 / (?s)STR1\\s*(.*?)\\s*STR2.

library(stringr)
a <- " anything goes here, STR1 GET_ME STR2, anything goes here"
res <- str_match(a, "STR1\\s*(.*?)\\s*STR2")
res[,2]
[1] "GET_ME"

Another way using base R regexec (to get the first match):

test <- " anything goes here, STR1 GET_ME STR2, anything goes here STR1 GET_ME2 STR2"
pattern <- "STR1\\s*(.*?)\\s*STR2"
result <- regmatches(test, regexec(pattern, test))
result[[1]][2]
[1] "GET_ME"

Extract all strings between two strings

    private static List<string> ExtractFromBody(string body, string start, string end)
{
List<string> matched = new List<string>();

int indexStart = 0;
int indexEnd = 0;

bool exit = false;
while (!exit)
{
indexStart = body.IndexOf(start);

if (indexStart != -1)
{
indexEnd = indexStart + body.Substring(indexStart).IndexOf(end);

matched.Add(body.Substring(indexStart + start.Length, indexEnd - indexStart - start.Length));

body = body.Substring(indexEnd + end.Length);
}
else
{
exit = true;
}
}

return matched;
}

Extract text between two strings if a substring exists between the two strings using Regex in Python

You can fix the code using

pat1 = '{0}\s*((?:(?!{0}).)*?{1}.*?)\s*{2}'.format(target1,target2,target3)

The pattern (see demo) is

StartString\s*((?:(?!StartString).)*?substring 1.*?)\s*EndString

Details

  • StartString - left-hand delimiter
  • \s* - 0+ whitespaces
  • ((?:(?!StartString).)*?substring 1.*?) - Group 1:
    • (?:(?!StartString).)*? - any char, 0 or more but as few as possible, that does not start with the left-hand delimiter
    • substring 1 - third string
    • .*? - any 0+ chars, as few as possible
  • \s*EndString - 0+ whitespaces and the right-hand delimiter.

See the Python demo:

import re
text_data='ghsauaigyssts twh\n\nghguy hja StartString I want this text (1) if substring 1 lies in between the two strings EndString bhghk [jhbn] xxzh StartString I want this text (2) as a different variable if substring 2 lies in between the two strings EndString ghjyjgu'
target1 = 'StartString'
target2 = 'substring 1'
target3 = 'EndString'
pat1 = '{0}\s*((?:(?!{0}).)*?{1}.*?)\s*{2}'.format(target1,target2,target3)
pattern = re.compile(pat1, flags=re.DOTALL)
print(pattern.findall(text_data))
# => ['I want this text (1) if substring 1 lies in between the two strings']

Regex extract string between 2 strings, that contains 3rd string

Try this pattern:

TG00[^#]*TG40 155963[^#]*#

This pattern just says to find the string TG40 155963 in between TG00 and an ending #. For the sample data in your demo there were 3 matches.

Demo

Find all strings in between two strings in Go

In Go, since its RE2-based regexp does not support lookarounds, you need to use capturing mechanism with regexp.FindAllStringSubmatch function:

left := "LEFT_DELIMITER_TEXT_HERE"
right := "RIGHT_DELIMITER_TEXT_HERE"
rx := regexp.MustCompile(`(?s)` + regexp.QuoteMeta(left) + `(.*?)` + regexp.QuoteMeta(right))
matches := rx.FindAllStringSubmatch(str, -1)

Note the use of regexp.QuoteMeta that automatically escapes all special regex metacharacters in the left- and right-hand delimiters.

The (?s) makes . match across lines and (.*?) captures all between ABC and XYZ into Group 1.

So, here you can use

package main

import (
"fmt"
"regexp"
)

func main() {
str:= "Movies: A B C Food: 1 2 3"
r := regexp.MustCompile(`Movies:\s*(.*?)\s*Food`)
matches := r.FindAllStringSubmatch(str, -1)
for _, v := range matches {
fmt.Println(v[1])
}
}

See the Go demo. Output: A B C.

Find all strings that are in between two sub strings

Use re.findall() to get every occurrence of your substring. $ is considered a special character in regular expressions meaning — "the end of the string" anchor, so you need to escape $ to match a literal character.

>>> import re
>>> s = '@@ cat $$ @@dog$^'
>>> re.findall(r'@@(.*?)\$', s)
[' cat ', 'dog']

To remove the leading and trailing whitespace, you can simply match it outside of the capture group.

>>> re.findall(r'@@\s*(.*?)\s*\$', s)
['cat', 'dog']

Also, if the context has a possibility of spanning across newlines, you may consider using negation.

>>> re.findall(r'@@\s*([^$]*)\s*\$', s)

Regular expression to get a string between two strings in Javascript

A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).

You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):

cow(.*)milk

No lookaheads are needed at all.



Related Topics



Leave a reply



Submit