How to Remove Square Brackets and Anything Between Them with a Regex

How to remove square brackets and anything between them with a regex?

[ and ] are special characters in a regex. They are used to list characters of a match. [a-z] matches any lowercase letter between a and z. [03b] matches a "0", "3", or "b". To match the characters [ and ], you have to escape them with a preceding \.

Your code currently says "replace any character of [](). with an empty string" (reordered from the order in which you typed them for clarity).


Greedy match:

preg_replace('/\[.*\]/', '', $str); // Replace from one [ to the last ]

A greedy match could match multiple [s and ]s. That expression would take an example [of "sneaky"] text [with more "sneaky"] here and turn it into an example here.

Perl has a syntax for a non-greedy match (you most likely don't want to be greedy):

preg_replace('/\[.*?\]/', '', $str);

Non-greedy matches try to catch as few characters as possible. Using the same example: an example [of "sneaky"] text [with more "sneaky"] here becomes an example text here.


Only up to the first following ]:

preg_replace('/\[[^\]]*\]/', '', $str); // Find a [, look for non-] characters, and then a ]

This is more explicit, but harder to read. Using the same example text, you'd get the output of the non-greedy expression.


Note that none of these deal explicitly with white space. The spaces on either side of [ and ] will remain.

Also note that all of these can fail for malformed input. Multiple [s and ]s without matches could cause a surprising result.

Regular expression to extract text between square brackets

You can use the following regex globally:

\[(.*?)\]

Explanation:

  • \[ : [ is a meta char and needs to be escaped if you want to match it literally.
  • (.*?) : match everything in a non-greedy way and capture it.
  • \] : ] is a meta char and needs to be escaped if you want to match it literally.

How to remove square parentheses and text within from strings in R

I would use:

input <- c("6.77[9]", "5.92[10]", "2.98[103]")
gsub("\\[.*?\\]", "", input)

[1] "6.77" "5.92" "2.98"

The regex pattern \[.*?\] should match any quoted terms in square brackets, and using gsub would tell R to replace all such terms.

Remove Square Brackets in Text and its contents

\[[^]]*\]

Try this.Replace by empty string.See demo.

http://regex101.com/r/xT7yD8/2

How to remove text inside brackets and parentheses at the same time with any whitespace before if present?

There are four main points here:

  • String between parentheses can be matched with \([^()]*\)
  • String between square brackets can be matched with \[[^][]*] (or \[[^\]\[]*\] if you prefer to escape literal [ and ], in PCRE, it is stylistic, but in some other regex flavors, it might be a must)
  • You need alternation to match either this or that pattern and account for any whitespaces before these patterns
  • Since after removing these strings you may get leading and trailing spaces, you need to trim the string.

You may use

$string = "Deadpool 2 [Region 4](Blu-ray)";
echo trim(preg_replace("/\s*(?:\[[^][]*]|\([^()]*\))/","", $string));

See the regex demo and a PHP demo.

The \[[^][]*] part matches strings between [ and ] having no other [ and ] inside and \([^()]*\) matches strings between ( and ) having no other parentheses inside. trim removes leading/trailing whitespace.

Regex graph and explanation:

Sample Image

  • \s* - 0+ whitespaces
  • (?: - start of a non-capturing group:

    • \[[^][]*] - [, zero or more chars other than [ and ] (note you may keep these brackets inside a character class unescaped in a PCRE pattern if ] is right after initial [, in JS, you would have to escape ] by all means, [^\][]*)
    • | - or (an alternation operator)
    • \([^()]*\) - (, any 0+ chars other than ( and ) and a )
  • ) - end of the non-capturing group.

Remove text between square brackets at the end of string

Note that \[.*?\]$ won't work as it will match the first [ (because a regex engine processes the string from left to right), and then will match all the rest of the string up to the ] at its end. So, it will match [something][something2] in input[something][something2].

You may specify the end of string anchor and use [^\][]* (matching zero or more chars other than [ and ]) instead of .*?:

\[[^\][]*]$

See the JS demo:

console.log(

"input[something][something2]".replace(/\[[^\][]*]$/, '')

);

Using Regex to delete contents between repeating brackets

You can use a word boundary in combination with a negated character class [^

\[[^][]*\bDontDeleteMe\b[^][]*\]

Regex demo

If the word is DeleteMe you can match it using word boundaries and repace with an empty string.

\[[^][]*\bDeleteMe\b[^][]*\]

Regex demo

Remove square brackets that don't have spaces between them

The new version of stringr may be of use to you, it has a nice widget for testing out regex matching.

stringr::str_view_all(c("[please]", "[help me]"), "(\\[)\\S*(\\])")

matches [, then any number of non-space characters, then ], with the [ and ] as capture groups. I'm not sure what you want to do with them.

Update: To remove brackets, you actually want to capture what's inside and then substitute with it.

stringr::str_replace_all(c("[please]", "[help me]"), "\\[(\\S*)\\]", "\\1")
#> [1] "please" "[help me]"

(capture any all-non-space characters between brackets, and substitute the entire string for the capture where found)

How can I remove the closing square bracket using regex in Python?

You can use

cleaned = re.sub(r'^\[+[A-Z\d-]+:\s*|]+$', '', string)

See the Python demo and the regex demo.

Alternatively, to make sure the string starts with [[word: and ends with ]s, you may use

cleaned = re.sub(r'^\[+[A-Z\d-]+:\s*(.*?)\s*]+$', r'\1', string)

See this regex demo and this Python demo.

And, in case you simply want to extract that text inside, you may use

# First match only
m = re.search(r'\[+[A-Z\d-]+:\s*(.*?)\s*]', string)
if m:
print(m.group(1))

# All matches
matches = re.findall(r'\[+[A-Z\d-]+:\s*(.*?)\s*]', string)

See this regex demo and this Python demo.

Details

  • ^ - start of string
  • \[+ - one or more [ chars
  • [A-Z\d-]+ - one or more uppercase ASCII letters, digits or - chars
  • : - a colon
  • \s* - zero or more whitespaces
  • | - or
  • ]+$ - one or more ] chars at the end of string.

Also, (.*?) is a capturing group with ID 1 that matches any zero or more chars other than line break chars, as few as possible. \1 in the replacement refers to the value stored in this group memory buffer.



Related Topics



Leave a reply



Submit