Regular Expression to Return Text Between Parenthesis

Regular expression to extract text between square brackets

You can use the following regex globally:

\[(.*?)\]

Explanation:

  • \[ : [ is a meta char and needs to be escaped if you want to match it literally.
  • (.*?) : match everything in a non-greedy way and capture it.
  • \] : ] is a meta char and needs to be escaped if you want to match it literally.

Regular expression to return text between parenthesis

If your problem is really just this simple, you don't need regex:

s[s.find("(")+1:s.find(")")]

Regular Expression to get a string between parentheses in Javascript

You need to create a set of escaped (with \) parentheses (that match the parentheses) and a group of regular parentheses that create your capturing group:

var regExp = /\(([^)]+)\)/;

var matches = regExp.exec("I expect five hundred dollars ($500).");

//matches[1] contains the value between the parentheses

console.log(matches[1]);

Pattern to extract text between parenthesis

Try this:

String x = "Hello (Java)";
Matcher m = Pattern.compile("\\((.*?)\\)").matcher(x);
while (m.find()) {
System.out.println(m.group(1));
}

or

String str = "Hello (Java)";
String answer = str.substring(str.indexOf("(")+1, str.indexOf(")"));

Regex to extract string between parentheses which also contains other parentheses

You can change exception for @ symbol in your regex to regex matches any characters and add quantifier that matches from 0 to infinity symbols. And also simplify your regex by deleting group construction:

\(.*\)

Here is the explanation for the regular expression:

  • Symbol \( matches the character ( literally.
  • .* matches any character (except for line terminators)
  • * quantifier matches between zero and unlimited times, as many times
    as possible, giving back as needed (greedy)
  • \) matches the character ) literally.

You can use regex101 to compose and debug your regular expressions.

R - Regular Expression to Extract Text Between Parentheses That Contain Keyword

You can use

str_extract_all(filtered_df$named_entities, "\\([^()]*'LOC'[^()]*\\)")

See the regex demo. Details:

  • \( - a ( char
  • [^()]* - zero or more chars other than ( and )
  • 'LOC' - a 'LOC' string
  • [^()]* - zero or more chars other than ( and )
  • \) - a ) char.

See the online R demo:

library(stringr)
x <- "[('one', 'CARDINAL'), ('Castro', 'PERSON'), ('Latin America', 'LOC'), ('Somoza', 'PERSON')]"
str_extract_all(x, "\\([^()]*'LOC'[^()]*\\)")
# => [1] "('Latin America', 'LOC')"

As a bonus solution to get Latin America, you can use

str_extract_all(x, "[^']+(?=',\\s*'LOC'\\))")
# => [1] "Latin America"

Here, [^']+(?=',\s*'LOC'\)) matches one or more chars other than ' that are followed with ',, zero or more whitespaces, and then 'LOC') string.

Regex to select text between parentheses with a specific text as target

You only have to include the target string surrounded by character classes that exclude the closing parenthesis:

\(([^)]*TARGET[^)]*)\)

If you only need to replace the match, you don't need the capture group (you can remove it).

Extract text between parentheses with suffix

The key here is to use the non-greedy wildcard .*?, otherwise everything between the first ( and the last ) would be caught:

library(stringr)
t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)'
str_extract_all(t, "(\\(.*?\\)\\*?)")[[1]] %>% str_subset("\\*$")
#> [1] "(Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*"

Created on 2021-03-03 by the reprex package (v1.0.0)

You can use the rev() function if you want to reverse the order and get it right to left.

This is far less elegant than I would like it but unexpectedly "(\\(.*?\\)\\*)" is not non-greedy, so I had to detect it at the end of the string. You can add %>% str_remove_all("\\*$") if you want to discard the star in the end string.

Regex for Text Between Brackets and Text Between Semicolons

The Problems with What You've Tried

There are a few problems with what you've tried:

  • It will omit the first and last characters of your match from the group, giving you something like asui Chitets.
  • It will have even more errors on strings that start with P or W. For example, in PW[Paul McCartney], you would match only ul McCartne with the group and ul McCartney with the full match.

The Regex

You want something like this:

(?<=\[)([^]]+)(?=\])

Here's a regex101 demo.

Explanation

(?<=\[) means that the match must be preceded by [

([^]]+) matches 1 or more characters that are not ]

(?=\])means that the match must be followed by ]

Sample Code

Here's some sample code (from the above regex101 link):

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?<=\[)([^]]+)(?=\])"

test_str = "PW[Yasui Chitetsu]"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches):
matchNum = matchNum + 1

print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1

print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Semicolons

In your title, you mentioned finding text between semicolons. The same logic would work for that, giving you this regex:

(?<=;)([^;]+)(?=;)


Related Topics



Leave a reply



Submit