How to Get Text Between Nested Parentheses

How to extract the content of also nested parentheses before and after a specific character?

I don't think you can currently solve this in a general way with a regular expression in Javascript, since you can't match balanced parentheses recursively.

Personally, I'd approach this by splitting the text into its constituent characters, building groups of parentheses, and joining all back together with some logic. For example:

let text = '(10+10)*2*((1+1)*1)√(16)+(12*12)+2';
let changedText = '';
let parts = text.split('');
let parCount = null;
let group = '';
let groups = [];

// Group the original text into nested parentheses and other characters.
for (let i = 0; i < parts.length; i++) {
// Keep a track of parentheses nesting; if parCount is larger than 0,
// then there are unclosed parentheses left in the current group.
if (parts[i] == '(') parCount++;
if (parts[i] == ')') parCount--;

group += parts[i];

// Add every group of balanced parens or single characters.
if (parCount === 0 && group !== '') {
groups.push(group);
group = '';
}
}

// Join groups, while replacing the root character and surrounding groups
// with the nthroot() syntax.
for (let i = 0; i < groups.length; i++) {
let isRoot = i < groups.length - 2 && groups[i + 1] == '√';
let hasParGroups = groups[i][0] == '(' && groups[i + 2][0] == '(';

// If the next group is a root symbol surrounded by parenthesized groups,
// join them using the nthroot() syntax.
if (isRoot && hasParGroups) {
let stripped = groups[i + 2].replace(/^\(|\)$/g, '');
changedText += `nthroot(${stripped}, ${groups[i]})`;
// Skip groups that belong to root.
i = i + 2;
} else {
// Append non-root groups.
changedText += groups[i]
}
}

console.log('Before:', text, '\n', 'After:', changedText);

Extract string between two brackets, including nested brackets in python

>>> import re
>>> s = """res = sqr(if((a>b)&(a<c),(a+b)*c,(a-b)*c)+if()+if()...)"""
>>> re.findall(r'if\((?:[^()]*|\([^()]*\))*\)', s)
['if((a>b)&(a<c),(a+b)*c,(a-b)*c)', 'if()', 'if()']

For such patterns, better to use VERBOSE flag:

>>> lvl2 = re.compile('''
... if\( #literal if(
... (?: #start of non-capturing group
... [^()]* #non-parentheses characters
... | #OR
... \([^()]*\) #non-nested pair of parentheses
... )* #end of non-capturing group, 0 or more times
... \) #literal )
... ''', flags=re.X)
>>> re.findall(lvl2, s)
['if((a>b)&(a<c),(a+b)*c,(a-b)*c)', 'if()', 'if()']


To match any number of nested pairs, you can use regex module, see Recursive Regular Expressions

Regex to find texts between nested parenthesis

The work around pattern can be the one that matches a line starting with {{info and then matches any 0+ chars as few as possible up to the line with just }} on it:

re.findall(r'(?sm)^{{[^\S\r\n]*info\s*(.*?)^}}$', s)

See the regex demo.

Details

  • (?sm) - re.DOTALL (now, . matches a newline) and re.MULTILINE (^ now matches line start and $ matches line end positions) flags
  • ^ - start of a line
  • {{ - a {{ substring
  • [^\S\r\n]* - 0+ horizontal whitespaces
  • info - a substring
  • \s* - 0+ whitespaces
  • (.*?) - Group 1: any 0+ chars, as few as possible
  • ^}}$ - start of a line, }} and end of the line.

How to get text between nested parentheses?

.NET allows recursion in regular expressions. See Balancing Group Definitions

var input = @"add(mul(a,add(b,c)),d) + e - sub(f,g)";

var regex = new Regex(@"
\( # Match (
(
[^()]+ # all chars except ()
| (?<Level>\() # or if ( then Level += 1
| (?<-Level>\)) # or if ) then Level -= 1
)+ # Repeat (to go from inside to outside)
(?(Level)(?!)) # zero-width negative lookahead assertion
\) # Match )",
RegexOptions.IgnorePatternWhitespace);

foreach (Match c in regex.Matches(input))
{
Console.WriteLine(c.Value.Trim('(', ')'));
}

Extract string inside nested brackets

This code scans the text by character and pushes an empty list on to the stack for every opening [ and pops the last pushed list off the stack for every closing ].

text = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'

def parse(text):
stack = []
for char in text:
if char == '[':
#stack push
stack.append([])
elif char == ']':
yield ''.join(stack.pop())
else:
#stack peek
stack[-1].append(char)

print(tuple(parse(text)))

Output;

(' who ', 'what ', ' hello   from the other side ', ' this is  slim shady ')
(' who ', 'what ', 'side', ' hello from the other ', ' this is slim shady ', 'd', 'w', 'a', 'g', 'oh my ')

How to get all values in parentheses in a string including nested parentheses?

I would use a regular expression that also requires the part in square brackets to precede the link that's within parentheses.

/\[([^\]]+)\]\([^)]+\)/g

Make sure to use the g flag. This also includes a capture group so you can differentiate the "visible" part (between square brackets) from the rest that is "invisible":

var text = "here is a (very) long link to this article on [Math.random()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random)";
var regExp = /\[([^\]]+)\]\([^)]+\)/g;
var match;while (match = regExp.exec(text)) { console.log("full match: " + match[0]); console.log("keep: " + match[1]);}

Extract text from inner-most nested parentheses of string

As a not-so-efficient coder, I like to have a chain of multiple regex to achieve the outcome (what each line of regex does is commented in each line):

library(stringr)
library(dplyr)
string %>%
str_replace_all(".*log\\((.*?)(_.+?)?\\).*", "\\1Ps") %>% # deal with "log" entry
str_replace_all(".*\\((.*?\\))", "\\1") %>% # delete anything before the last "("
str_replace_all("(_\\d+)?\\)\\^2", "Sq") %>% # take care of ^2
str_replace_all("(_.+)?\\)?", "") -> "outcome" # remove extra characters in the end (e.g. "_00" and ")")

Best <- c("Intercept", "AspectCos", "CanCov", "DST50", "Ele", "NDVI", "Slope", "SlopeSq",
"SlopeVar", "CanCov", "NDVI", "Slope", "SlopeSq", "SlopeVarPs", "CanCov","Slope", "SlopeSq")
all(outcome == Best)
## TRUE

Extract strings between brackets and nested brackets

You can do this with splits. If you separate the string using '_(' instead of only '_', the second part onward will be an enclosed keyword. you can strip the closing parentheses and split those parts on the '(' to get either one component (if there was no nested parentesis) or two components. You then form either a one-element list or dictionary depending on the number of components.

line = ";star/stellar_(class(ification))_(chart)"

if line.startswith(";"):
parts = [ part.rstrip(")") for part in line.split("_(")[1:]]
parts = [ part.split("(",1) for part in parts ]
parts = [ part if len(part)==1 else dict([part]) for part in parts ]
print(parts)

[{'class': 'ification'}, ['chart']]

Note that I assumed that the first part of the string is never included in the process and that there can only be one nested group at the end of the parts. If that is not the case, please update your question with relevant examples and expected output.

What a RegEx that can match text in parentheses with nested parentheses

To extract the text from your example data, I think you can use this regex:

\(pattern:?\s?(.+?\)?)\)

  • match \(pattern
  • an optional colon: :?
  • an optional whitespace \s?
  • start capturing group (
  • capture one or more characters non greedy .+?
  • an optional \)
  • close capturing group
  • match \)

    var string = "Some text (pattern: SOME TEXT THAT (I WANT TO EXTRACT)) a bit more text (another pattern: ignore that text) and may be a little more text Some text (pattern: SOME TEXT THAT I WANT TO EXTRACT) a bit more text (another pattern: ignore that text) and may be a little more text";    var myRegexp = /\(pattern:?\s?(.+?\)?)\)/g;    var matches;    while ((matches = myRegexp.exec(string)) !== null) {        console.log(matches[1]);    }


Related Topics



Leave a reply



Submit