How to Extract Text That Lies Between Parentheses (Round Brackets)

How to extract all strings between brackets using c#?

Here is a LINQ solution:

var result = myString.Split().Where(x => x.StartsWith("(") && x.EndsWith(")")).ToList();

Values stored in result:

result[0] = (1)
result[1] = (0000000000)

And if you want only the numbers without the brackets use:

var result = myString.Split().Where(x => x.StartsWith("(") && x.EndsWith(")"))
.Select(x=>x.Replace("(", string.Empty).Replace(")", string.Empty))
.ToList();

Values stored in result:

result[0] = 1
result[1] = 0000000000

How can we split string and extract the text between round brackets

You can use separate() and set the separator as "\\s(?=\\()".

library(tidyr)

df %>%
separate(study_name, c("study", "reference"), sep = "\\s(?=\\()")

# # A tibble: 2 x 3
# study reference results
# <chr> <chr> <chr>
# 1 apple bannan (tcga, raw 2018) Untested
# 2 frame shift (mskk2 nature, 2000) tested

If you want to extract the text in the parentheses, using extract() is a suitable choice.

df %>%
extract(study_name, c("study", "reference"), regex = "(.+)\\s\\((.+)\\)")

# # A tibble: 2 x 3
# study reference results
# <chr> <chr> <chr>
# 1 apple bannan tcga, raw 2018 Untested
# 2 frame shift mskk2 nature, 2000 tested

Extract text between parentheses with suffix

The key here is to use the non-greedy wildcard .*?, otherwise everything between the first ( and the last ) would be caught:

library(stringr)
t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)'
str_extract_all(t, "(\\(.*?\\)\\*?)")[[1]] %>% str_subset("\\*$")
#> [1] "(Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*"

Created on 2021-03-03 by the reprex package (v1.0.0)

You can use the rev() function if you want to reverse the order and get it right to left.

This is far less elegant than I would like it but unexpectedly "(\\(.*?\\)\\*)" is not non-greedy, so I had to detect it at the end of the string. You can add %>% str_remove_all("\\*$") if you want to discard the star in the end string.

extracting text inside brackets & text outside with regex

You may use a [^][]+ pattern (negated character class) to match 1 or more chars other than [ and ]:

\[(?<colors>[^][]+)](?<text>[^][]+)

See the regex demo. Note that in the demo, I added \n to the negated character classes as the test is done on a single multiline string, if you test against multiple separate, standalone strings, you do not need it.

In some regex engines, you have to escape [ or ] or both inside a character class, as in Java, Ruby or Swift/Objective-C (Boost library), so use

\[(?<colors>[^\]\[]+)](?<text>[^\]\[]+)

If text group may be empty, replace (?<text>[^\]\[]+) with (?<text>[^\]\[]*), * will match 0 or more occurrences.

Get text between two rounded brackets





console.log(

"This is (my) simple text".match(/\(([^)]+)\)/)[1]

);

Regex to extract text between brackets

Try this:

Insurance.*\[(\d+)\]

Or if you want to match it between the 2x "Insurance" words

Insurance.*\[(\d+)\][\s\S]+?Insurance

Demo here.

Where

  • Insurance - Match the starting word "Insurance"
  • .* - Match any character
  • \[ - Match the opening bracket
  • (\d+) - Capture the numerical value inside brackets
  • \] - Match the closing bracket
  • [\s\S]+? - Match any character (including newlines) in a non-greedy way so that it wouldn't span across multiple "Insurance" words
  • Insurance - Match the ending word "Insurance"

How to extract text inside the brackets in R?

You can use str_extract_all from the stringr package with this regex pattern:

stringr::str_extract_all(string, 
"\\(\\w+([[:punct:]]{1}|[[:blank:]]{1})[[:digit:]]+\\)")

# [[1]]
# [1] "(antonio.2018)" "(giovanni,2018)" "(libero 2019)"

A small description of the regex:

\\w will match any word-character

+ means that it has to be matched at least once

[[:punct:]] will match any punctuation character

{1} will exactly one appearance

(....|....) indicates one pattern OR the other has to be met

[[:blank:]] means any whitespace must occur

[[:digit:]] means any digit must occur

\\( braces have to be exited.

Extract text between parentheses from string

Split() cuts a string into an array of parts separated by a string (not a regexp). You are looking for the parts of an input string that are (made of) a sequence of digits. So you need a Regexp that specifies a sequence of digits: \d+.

>> strtoSplit = "Hello everyone! (27082015) What is your name? (123456789)"
>> Set r = New RegExp
>> r.Global = True
>> r.Pattern = "\d+"
>> For Each m In r.Execute(strtoSplit)
>> WScript.Echo m.Value
>> Next
>>
27082015
123456789

On seconded thought:

I should have understood from the start that you are interested in the parts 'between the parentheses'. So the pattern needs to change and we have to access a group:

>> Set r = New RegExp
>> r.Global = True
>> r.Pattern = "\(([^)]+)\)"
>> For Each m In r.Execute(strtoSplit)
>> WScript.Echo m.SubMatches(0)
>> Next
>>
27082015
123456789

Regular expression to extract text between square brackets

You can use the following regex globally:

\[(.*?)\]

Explanation:

  • \[ : [ is a meta char and needs to be escaped if you want to match it literally.
  • (.*?) : match everything in a non-greedy way and capture it.
  • \] : ] is a meta char and needs to be escaped if you want to match it literally.


Related Topics



Leave a reply



Submit