Regular expression to extract text between square brackets
You can use the following regex globally:
\[(.*?)\]
Explanation:
\[
:[
is a meta char and needs to be escaped if you want to match it literally.(.*?)
: match everything in a non-greedy way and capture it.\]
:]
is a meta char and needs to be escaped if you want to match it literally.
Regular expression to return text between parenthesis
If your problem is really just this simple, you don't need regex:
s[s.find("(")+1:s.find(")")]
Regular Expression to get a string between parentheses in Javascript
You need to create a set of escaped (with \
) parentheses (that match the parentheses) and a group of regular parentheses that create your capturing group:
var regExp = /\(([^)]+)\)/;
var matches = regExp.exec("I expect five hundred dollars ($500).");
//matches[1] contains the value between the parentheses
console.log(matches[1]);
Pattern to extract text between parenthesis
Try this:
String x = "Hello (Java)";
Matcher m = Pattern.compile("\\((.*?)\\)").matcher(x);
while (m.find()) {
System.out.println(m.group(1));
}
or
String str = "Hello (Java)";
String answer = str.substring(str.indexOf("(")+1, str.indexOf(")"));
Regex to extract string between parentheses which also contains other parentheses
You can change exception for @
symbol in your regex to regex matches any characters and add quantifier that matches from 0 to infinity symbols. And also simplify your regex by deleting group construction:
\(.*\)
Here is the explanation for the regular expression:
- Symbol
\(
matches the character(
literally. .*
matches any character (except for line terminators)*
quantifier matches between zero and unlimited times, as many times
as possible, giving back as needed (greedy)\)
matches the character)
literally.
You can use regex101 to compose and debug your regular expressions.
R - Regular Expression to Extract Text Between Parentheses That Contain Keyword
You can use
str_extract_all(filtered_df$named_entities, "\\([^()]*'LOC'[^()]*\\)")
See the regex demo. Details:
\(
- a(
char[^()]*
- zero or more chars other than(
and)
'LOC'
- a'LOC'
string[^()]*
- zero or more chars other than(
and)
\)
- a)
char.
See the online R demo:
library(stringr)
x <- "[('one', 'CARDINAL'), ('Castro', 'PERSON'), ('Latin America', 'LOC'), ('Somoza', 'PERSON')]"
str_extract_all(x, "\\([^()]*'LOC'[^()]*\\)")
# => [1] "('Latin America', 'LOC')"
As a bonus solution to get Latin America
, you can use
str_extract_all(x, "[^']+(?=',\\s*'LOC'\\))")
# => [1] "Latin America"
Here, [^']+(?=',\s*'LOC'\))
matches one or more chars other than '
that are followed with ',
, zero or more whitespaces, and then 'LOC')
string.
Regex to select text between parentheses with a specific text as target
You only have to include the target string surrounded by character classes that exclude the closing parenthesis:
\(([^)]*TARGET[^)]*)\)
If you only need to replace the match, you don't need the capture group (you can remove it).
Extract text between parentheses with suffix
The key here is to use the non-greedy wildcard .*?
, otherwise everything between the first (
and the last )
would be caught:
library(stringr)
t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)'
str_extract_all(t, "(\\(.*?\\)\\*?)")[[1]] %>% str_subset("\\*$")
#> [1] "(Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*"
Created on 2021-03-03 by the reprex package (v1.0.0)
You can use the rev()
function if you want to reverse the order and get it right to left.
This is far less elegant than I would like it but unexpectedly "(\\(.*?\\)\\*)"
is not non-greedy, so I had to detect it at the end of the string. You can add %>% str_remove_all("\\*$")
if you want to discard the star in the end string.
Regex for Text Between Brackets and Text Between Semicolons
The Problems with What You've Tried
There are a few problems with what you've tried:
- It will omit the first and last characters of your match from the group, giving you something like
asui Chitets
. - It will have even more errors on strings that start with
P
orW
. For example, inPW[Paul McCartney]
, you would match onlyul McCartne
with the group andul McCartney
with the full match.
The Regex
You want something like this:
(?<=\[)([^]]+)(?=\])
Here's a regex101 demo.
Explanation
(?<=\[)
means that the match must be preceded by [
([^]]+)
matches 1 or more characters that are not ]
(?=\])
means that the match must be followed by ]
Sample Code
Here's some sample code (from the above regex101 link):
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"(?<=\[)([^]]+)(?=\])"
test_str = "PW[Yasui Chitetsu]"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Semicolons
In your title, you mentioned finding text between semicolons. The same logic would work for that, giving you this regex:
(?<=;)([^;]+)(?=;)
Related Topics
How to Deploy a Perl/Python/Ruby Script Without Installing an Interpreter
Unicodedecodeerror, Invalid Continuation Byte
How Is Python's List Implemented
Convert Binary to Ascii and Vice Versa
Valueerror: Setting an Array Element with a Sequence
List Comprehension Rebinds Names Even After Scope of Comprehension. Is This Right
Passing Extra Arguments Through Connect
Why Aren't Python Nested Functions Called Closures
How to Generate Dynamic (Parameterized) Unit Tests in Python
Construct Pandas Dataframe from Items in Nested Dictionary
How to Upload File with Python Requests
Unicodeencodeerror: 'Charmap' Codec Can't Encode - Character Maps to <Undefined>, Print Function
Cannot Open Include File: 'Io.H': No Such File or Directory
Is There a Built-In Function to Print All the Current Properties and Values of an Object
How to Print the Full Numpy Array, Without Truncation