Match Text Between Two Strings with Regular Expression

Regex Match all characters between two strings

For example

(?<=This is)(.*)(?=sentence)

Regexr

I used lookbehind (?<=) and look ahead (?=) so that "This is" and "sentence" is not included in the match, but this is up to your use case, you can also simply write This is(.*)sentence.

The important thing here is that you activate the "dotall" mode of your regex engine, so that the . is matching the newline. But how you do this depends on your regex engine.

The next thing is if you use .* or .*?. The first one is greedy and will match till the last "sentence" in your string, the second one is lazy and will match till the next "sentence" in your string.

Update

Regexr

This is(?s)(.*)sentence

Where the (?s) turns on the dotall modifier, making the . matching the newline characters.

Update 2:

(?<=is \()(.*?)(?=\s*\))

is matching your example "This is (a simple) sentence". See here on Regexr

Regular expression to get a string between two strings in Javascript

A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).

You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):

cow(.*)milk

No lookaheads are needed at all.

Match text between two strings with regular expression

Use re.search

>>> import re
>>> s = 'Part 1. Part 2. Part 3 then more text'
>>> re.search(r'Part 1\.(.*?)Part 3', s).group(1)
' Part 2. '
>>> re.search(r'Part 1(.*?)Part 3', s).group(1)
'. Part 2. '

Or use re.findall, if there are more than one occurances.

Regex to pull out text between two strings

Assuming those tags can't be nested, you can use the following regex with the single-line flag to match the tags and their content :

\[QUOTE\b.*?\[/QUOTE]

You can try it here.

Selecting text between two strings by matching using regex

Could be this

r"\* \* \*\s*level a20\. heading1 random\s*(.*?)\s*\* \* \*\s*level b22\. random-heading2"

capture group 1 contains trimmed content.

Get string between two strings - first string ends with newline

In general, you may use a pattern like

/--START\s*(.*?)\s*--END/s

See the regex demo. \s* will match any 0+ whitespaces, but it won't require line breaks after --START and before --END.

A bit more specific pattern will be

/--START\h*\R\s*(.*?)\h*\R\s*--END/s

Or, if the --START and --END should appear at the start of lines, add anchors and m modifier:

/^--START\h*\R\s*(.*?)\h*\R\s*--END$/sm

See the regex demo and another regex demo.

Details

  • ^ - start of a line (since m modifier is used)
  • --START - left-hand delimiter
  • \h* - 0+ horizontal whitespaces
  • \R - a line break
  • \s* - 0+ whitespaces
  • (.*?) - Group 1: any 0+ chars, as few as possible
  • \h* - 0+ horizontal whitespaces
  • \R - a linebreak
  • \s* - 0+ whitespaces
  • --END - the right-hand delimiter
  • $ - end of a line.

Regular Expression to find a string included between two characters while EXCLUDING the delimiters

Easy done:

(?<=\[)(.*?)(?=\])

Technically that's using lookaheads and lookbehinds. See Lookahead and Lookbehind Zero-Width Assertions. The pattern consists of:

  • is preceded by a [ that is not captured (lookbehind);
  • a non-greedy captured group. It's non-greedy to stop at the first ]; and
  • is followed by a ] that is not captured (lookahead).

Alternatively you can just capture what's between the square brackets:

\[(.*?)\]

and return the first captured group instead of the entire match.



Related Topics



Leave a reply



Submit