Remove Text Inside Brackets, Parens, And/Or Braces

Remove text inside brackets, parens, and/or braces

Maybe this function is a little more straight-forward? Or at least more compact.

bracketXtract <-
    function(txt, br = c("(", "[", "{", "all"), with=FALSE)
{
    br <- match.arg(br)
    left <-        # what pattern are we looking for on the left?
        if ("all" == br) "\\(|\\{|\\["
        else sprintf("\\%s", br)
    map <-         # what's the corresponding pattern on the right?
        c(`\\(`="\\)", `\\[`="\\]", `\\{`="\\}",
          `\\(|\\{|\\[`="\\)|\\}|\\]")
    fmt <-         # create the appropriate regular expression
        if (with) "(%s).*?(%s)"
        else "(?<=%s).*?(?=%s)"
    re <- sprintf(fmt, left, map[left])
    regmatches(txt, gregexpr(re, txt, perl=TRUE))    # do it!
}

No need to lapply; the regular expression functions are vectorized in that way. This fails with nested parentheses; likely regular expressions won't be a good solution if that's important. Here we are in action:

> txt <- c("I love chicken [unintelligible]!",
+          "Me too! (laughter) It's so good.[interupting]",
+          "Yep it's awesome {reading}.",
+          "Agreed.")
> bracketXtract(txt, "all")
[[1]]
[1] "unintelligible"

[[2]]
[1] "laughter"    "interupting"

[[3]]
[1] "reading"

[[4]]
character(0)

This fits without trouble into a data.frame.

> examp2 <- data.frame(var1=1:4)
> examp2$text <- c("I love chicken [unintelligible]!",
+                  "Me too! (laughter) It's so good.[interupting]",
+                  "Yep it's awesome {reading}.", "Agreed.")
> examp2$text2<-bracketXtract(examp2$text, 'all')
> examp2
  var1                                          text                 text2
1    1              I love chicken [unintelligible]!        unintelligible
2    2 Me too! (laughter) It's so good.[interupting] laughter, interupting
3    3                   Yep it's awesome {reading}.               reading
4    4                                       Agreed.

The warning you were seeing has to do with trying to stick a matrix into a data frame. I think the answer is "don't do that".

> df = data.frame(x=1:2)
> df$y = matrix(list(), 2, 2)
> df
  x    y
1 1 NULL
2 2 NULL
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

How to remove text inside brackets and parentheses at the same time with any whitespace before if present?

There are four main points here:

String between parentheses can be matched with \([^()]*\)
String between square brackets can be matched with \[[^][]*] (or \[[^\]\[]*\] if you prefer to escape literal [ and ], in PCRE, it is stylistic, but in some other regex flavors, it might be a must)
You need alternation to match either this or that pattern and account for any whitespaces before these patterns
Since after removing these strings you may get leading and trailing spaces, you need to trim the string.

You may use

$string = "Deadpool 2 [Region 4](Blu-ray)";
echo trim(preg_replace("/\s*(?:\[[^][]*]|\([^()]*\))/","", $string));

See the regex demo and a PHP demo.

The \[[^][]*] part matches strings between [ and ] having no other [ and ] inside and \([^()]*\) matches strings between ( and ) having no other parentheses inside. trim removes leading/trailing whitespace.

Regex graph and explanation:

Sample Image

\s* - 0+ whitespaces
(?: - start of a non-capturing group:
- \[[^][]*] - [, zero or more chars other than [ and ] (note you may keep these brackets inside a character class unescaped in a PCRE pattern if ] is right after initial [, in JS, you would have to escape ] by all means, [^\][]*)
- | - or (an alternation operator)
- \([^()]*\) - (, any 0+ chars other than ( and ) and a )
) - end of the non-capturing group.

Remove text between () and []

This should work for parentheses. Regular expressions will "consume" the text it has matched so it won't work for nested parentheses.

import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)

or this would find one set of parentheses, simply loop to find more:

start = mystring.find("(")
end = mystring.find(")")
if start != -1 and end != -1:
  result = mystring[start+1:end]

How to remove text between multiple pairs of brackets?

You have two main possibilities:

change .* to .*? i.e. match as few as possible and thus match ) as early as possible:

text = Regex.Replace(text, @"\(.*?\)", "");
text = Regex.Replace(text, @"\s{2,}", " "); // let's exclude trivial replaces

change .* to [^)]* i.e. match any symbols except ):

text = Regex.Replace(text, @"\([^)]*\)", "");
text = Regex.Replace(text, @"\s{2,}", " ");

Remove parentheses and text within from strings in R

A gsub should work here

gsub("\\s*\\([^\\)]+\\)","",as.character(companies$Name))
# or using "raw" strings as of R 4.0
gsub(r"{\s*\([^\)]+\)}","",as.character(companies$Name))

# [1] "Company A Inc"  "Company B"      "Company C Inc."
# [4] "Company D Inc." "Company E"

Here we just replace occurrences of "(...)" with nothing (also removing any leading space). R makes it look worse than it is with all the escaping we have to do for the parenthesis since they are special characters in regular expressions.

Remove text between () and [] based on condition in Python?

You can use a simple regex with re.sub and a function as replacement to check the length of the match:

import re
out = re.sub('\(.*?\)|\[.*?\]',
             lambda m: '' if len(m.group())<=(4+2) else m.group()[1:-1],
             text)

Output:

'This is a sentence.  Once a day twice a day '

This give you the logic for more complex checks, in which case you might want to define a named function rather than a lambda

How can I remove text within parentheses with a regex?

s/\([^)]*\)//

So in Python, you'd do:

re.sub(r'\([^)]*\)', '', filename)

Remove Text Between Parentheses PHP

$string = "ABC (Test1)";
echo preg_replace("/\([^)]+\)/","",$string); // 'ABC '

preg_replace is a perl-based regular expression replace routine. What this script does is matches all occurrences of a opening parenthesis, followed by any number of characters not a closing parenthesis, and again followed by a closing parenthesis, and then deletes them:

Regular expression breakdown:

/  - opening delimiter (necessary for regular expressions, can be any character that doesn't appear in the regular expression
\( - Match an opening parenthesis
[^)]+ - Match 1 or more character that is not a closing parenthesis
\) - Match a closing parenthesis
/  - Closing delimiter

JavaScript/regex: Remove text between parentheses

"Hello, this is Mike (example)".replace(/ *\([^)]*\) */g, "");

Result:

"Hello, this is Mike"

Remove Text Inside Brackets, Parens, And/Or Braces