How to Get Multi-Line String Between Two Braces Containing a Specific Search String

How do I get multi-line string between two braces containing a specific search string?

This gnu-awk should work:

awk -v RS='[^\n]*{|}' 'RT ~ /{/{p=RT} /event/{ print p $0 RT }' file
blabla {
blabla
blablaeventblabla
}

RS='[^\n]*{\n|}' sets input record separator as any text followed by { OR a }. RT is the internal awk variable that is set to matched text based on RS regex.

How to match all text between two strings multiline

Well unfortunately, RegExr is dependent on the JS RegExp implementation, which does not support the option to enable the flag/modifier that you need.

You are looking for the s (DotAll) modifier forcing the dot . to match newline sequences.

  • Live Demo on regular expressions 101

If you are using JavaScript, you can use this workaround:

/<!-- OPTIONAL -->([\S\s]*?)<!-- OPTIONAL END -->/

Regex matching text in brackets over multiple lines

The problem is that you are only capturing the last character with (.*|\n)*? (because .? isn't inside of the capturing group).

You could change the capturing group to a non-capturing group and then wrap that and *? with a capturing group in order to capture all the matches ((?:.*?|\n)*?).

Example Here

Pattern p = Pattern.compile("node \\[\\n((?:.*?|\\n)*?)\\]", Pattern.MULTILINE);
Matcher m = p.matcher(text);
while(m.find())
{
System.out.println(m.group(1));
}

However, the regular expression above is relatively inefficient. A potentially better approach would be to match non-] characters with a negated character set, ([^\]]*).

Example Here

Pattern p = Pattern.compile("node \\[\\n([^\\]]*)\\]", Pattern.MULTILINE);
Matcher m = p.matcher(text);
while(m.find())
{
System.out.println(m.group(1));
}

How to find multiline text between curly braces?

Use the option re.MULTILINE as a second argument to your re.compile/etc. call.

I would propose this regex: _NAME_KEY_[^{]*+\{([^}]+)\}

Explanation:

_NAME_KEY_: match "_NAME_KEY_"

[^{]*: match as many non-{-characters as possible (greedy)

\{: match a { character

([^}]+): match (and capture) non-}-characters (greedy)

\}: match one } character

Search for text between two patterns with multiple lines in between

$ awk '/foo/{++c;next} c==1' file
there
is
random
text
here

$ awk '/foo/{++c;next} c==3' file
even
more
random
text
here

or with GNU awk for multi-char RS you COULD do:

$ awk -v RS='(^|\n)[^\n]*foo[^\n]*(\n|$)' 'NR==2' file
there
is
random
text
here

$ awk -v RS='(^|\n)[^\n]*foo[^\n]*(\n|$)' 'NR==4' file
even
more
random
text
here

See https://stackoverflow.com/a/17914105/1745001 for other ways of printing after a condition is true.

Regex: Lines between two Strings as separate Matches

With .net you can use this pattern in a global research:

with the multiline option:

@"(?:\G(?!\A)|START-OF-FIELDS)\r?\n(.*)(?>\r?\nEND-OF-FIELD(?=S\r?$))?"

The result is in capture group 1.

The pattern works with 2 entry points. The first one is "START-OF-FIELDS" that is used for the first result. The second is \G(?!\A) that is used for other results.

\G is an anchor for the position in the string after the last match. At the begining \G is initialized to the start of the string position, to avoid this special case, I added (?!\A) to be sure that this branch fails at the first position.

With \G only contigous match are allowed after the first result.

To break the contiguity, I added an optional non capturing group that match "END-OF-FIELDS" but without the last character.

You can see a demo here.

An other way is possible with C#, since it is possible to extract all that have been matched by a repeated capturing group:

With this pattern:

string pattern = @"START-OF-FIELDS\r?\n(?>(.*)\r?\n)*?(?>END-OF-FIELD(?=S\r?$))";

Match match = Regex.Match(input, pattern, RegexOptions.Multiline);

if (match.Success) {
foreach (Capture capture in match.Groups[1].Captures) {
Console.WriteLine(capture.Value);
}
}

The advantage of this way is that the search stops when the fields are found.

Regex for fetching multiple strings within brackets

In PCRE you can use this recursive regex to capture what you want:

~(?: ^data | (?!^)\G ) \h+ ( \w+ \h* ( \( (?: [^()]*+ | (?-1) )* \) ) )~xi

RegEx Demo

Your match is available in captured group #1

RegEx Details:

  • (?: ^data | (?!^)\G ): Start with data in a line or else match from end of previous match i.e. \G
  • \h+: Match 1+ whitespaces
  • (: Start capture group #1
    • \w+: Match 1+ word characters
    • \h*: Match 0+ whitespaces
    • (: Start capture group #2
      • \(: Match literal ( (opening)
      • (?:: Start non-capture group
        • [^()]*+: Match 0 or more of any characters that are not ( and )
        • |: OR
        • (?-1): Recurse the match with latest group i.e. #2
      • )*: End non-capture group. Match 0 or more of this group
    • ): capture group #2
  • ): capture group #1

Reference: RegEx Expression Recursion

Regex python: match multi-line float values between brackets

Here's a regex I created r'group1 = \[\n([ *-?\d\.\d,\n]+)\]':

import re

s = '''group1 = [
1.0,
-2.0,
3.5,
-0.3,
1.7,
4.2,
]

group2 = [
2.0,
1.5,
1.8,
-1.8,
0.7,
-0.3,
]

group1 = [
0.0,
-0.5,
1.3,
0.8,
-0.4,
0.1,
]'''

groups = re.findall(r'group1 = \[\n([ *-?\d\.\d,\n]+)\]', s)
groups = [float(f) for l in map(lambda p: p.split(','), groups) for f in l if f.strip()]
print(groups)

Output:

[1.0, -2.0, 3.5, -0.3, 1.7, 4.2, 0.0, -0.5, 1.3, 0.8, -0.4, 0.1]


Related Topics



Leave a reply



Submit