Remove text inside brackets, parens, and/or braces
Maybe this function is a little more straight-forward? Or at least more compact.
bracketXtract <-
function(txt, br = c("(", "[", "{", "all"), with=FALSE)
{
br <- match.arg(br)
left <- # what pattern are we looking for on the left?
if ("all" == br) "\\(|\\{|\\["
else sprintf("\\%s", br)
map <- # what's the corresponding pattern on the right?
c(`\\(`="\\)", `\\[`="\\]", `\\{`="\\}",
`\\(|\\{|\\[`="\\)|\\}|\\]")
fmt <- # create the appropriate regular expression
if (with) "(%s).*?(%s)"
else "(?<=%s).*?(?=%s)"
re <- sprintf(fmt, left, map[left])
regmatches(txt, gregexpr(re, txt, perl=TRUE)) # do it!
}
No need to lapply
; the regular expression functions are vectorized in that way. This fails with nested parentheses; likely regular expressions won't be a good solution if that's important. Here we are in action:
> txt <- c("I love chicken [unintelligible]!",
+ "Me too! (laughter) It's so good.[interupting]",
+ "Yep it's awesome {reading}.",
+ "Agreed.")
> bracketXtract(txt, "all")
[[1]]
[1] "unintelligible"
[[2]]
[1] "laughter" "interupting"
[[3]]
[1] "reading"
[[4]]
character(0)
This fits without trouble into a data.frame
.
> examp2 <- data.frame(var1=1:4)
> examp2$text <- c("I love chicken [unintelligible]!",
+ "Me too! (laughter) It's so good.[interupting]",
+ "Yep it's awesome {reading}.", "Agreed.")
> examp2$text2<-bracketXtract(examp2$text, 'all')
> examp2
var1 text text2
1 1 I love chicken [unintelligible]! unintelligible
2 2 Me too! (laughter) It's so good.[interupting] laughter, interupting
3 3 Yep it's awesome {reading}. reading
4 4 Agreed.
The warning you were seeing has to do with trying to stick a matrix into a data frame. I think the answer is "don't do that".
> df = data.frame(x=1:2)
> df$y = matrix(list(), 2, 2)
> df
x y
1 1 NULL
2 2 NULL
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs
How to remove text inside brackets and parentheses at the same time with any whitespace before if present?
There are four main points here:
- String between parentheses can be matched with
\([^()]*\)
- String between square brackets can be matched with
\[[^][]*]
(or\[[^\]\[]*\]
if you prefer to escape literal[
and]
, in PCRE, it is stylistic, but in some other regex flavors, it might be a must) - You need alternation to match either this or that pattern and account for any whitespaces before these patterns
- Since after removing these strings you may get leading and trailing spaces, you need to
trim
the string.
You may use
$string = "Deadpool 2 [Region 4](Blu-ray)";
echo trim(preg_replace("/\s*(?:\[[^][]*]|\([^()]*\))/","", $string));
See the regex demo and a PHP demo.
The \[[^][]*]
part matches strings between [
and ]
having no other [
and ]
inside and \([^()]*\)
matches strings between (
and )
having no other parentheses inside. trim
removes leading/trailing whitespace.
Regex graph and explanation:
\s*
- 0+ whitespaces(?:
- start of a non-capturing group:\[[^][]*]
-[
, zero or more chars other than[
and]
(note you may keep these brackets inside a character class unescaped in a PCRE pattern if]
is right after initial[
, in JS, you would have to escape]
by all means,[^\][]*
)|
- or (an alternation operator)\([^()]*\)
-(
, any 0+ chars other than(
and)
and a)
)
- end of the non-capturing group.
Remove text between () and []
This should work for parentheses. Regular expressions will "consume" the text it has matched so it won't work for nested parentheses.
import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)
or this would find one set of parentheses, simply loop to find more:
start = mystring.find("(")
end = mystring.find(")")
if start != -1 and end != -1:
result = mystring[start+1:end]
How to remove text between multiple pairs of brackets?
You have two main possibilities:
change
.*
to.*?
i.e. match as few as possible and thus match)
as early as possible:text = Regex.Replace(text, @"\(.*?\)", "");
text = Regex.Replace(text, @"\s{2,}", " "); // let's exclude trivial replaceschange
.*
to[^)]*
i.e. match any symbols except)
:text = Regex.Replace(text, @"\([^)]*\)", "");
text = Regex.Replace(text, @"\s{2,}", " ");
Remove parentheses and text within from strings in R
A gsub
should work here
gsub("\\s*\\([^\\)]+\\)","",as.character(companies$Name))
# or using "raw" strings as of R 4.0
gsub(r"{\s*\([^\)]+\)}","",as.character(companies$Name))
# [1] "Company A Inc" "Company B" "Company C Inc."
# [4] "Company D Inc." "Company E"
Here we just replace occurrences of "(...)" with nothing (also removing any leading space). R makes it look worse than it is with all the escaping we have to do for the parenthesis since they are special characters in regular expressions.
Remove text between () and [] based on condition in Python?
You can use a simple regex with re.sub
and a function as replacement to check the length of the match:
import re
out = re.sub('\(.*?\)|\[.*?\]',
lambda m: '' if len(m.group())<=(4+2) else m.group()[1:-1],
text)
Output:
'This is a sentence. Once a day twice a day '
This give you the logic for more complex checks, in which case you might want to define a named function rather than a lambda
How can I remove text within parentheses with a regex?
s/\([^)]*\)//
So in Python, you'd do:
re.sub(r'\([^)]*\)', '', filename)
Remove Text Between Parentheses PHP
$string = "ABC (Test1)";
echo preg_replace("/\([^)]+\)/","",$string); // 'ABC '
preg_replace
is a perl-based regular expression replace routine. What this script does is matches all occurrences of a opening parenthesis, followed by any number of characters not a closing parenthesis, and again followed by a closing parenthesis, and then deletes them:
Regular expression breakdown:
/ - opening delimiter (necessary for regular expressions, can be any character that doesn't appear in the regular expression
\( - Match an opening parenthesis
[^)]+ - Match 1 or more character that is not a closing parenthesis
\) - Match a closing parenthesis
/ - Closing delimiter
JavaScript/regex: Remove text between parentheses
"Hello, this is Mike (example)".replace(/ *\([^)]*\) */g, "");
Result:
"Hello, this is Mike"
Related Topics
R Calculate the Average of One Column Corresponding to Each Bin of Another Column
How to Convert List of List into a Tibble (Dataframe)
Ess to Call Different Installations of R
Efficiently Counting Non-Na Elements in Data.Table
Is There a Fast Parser for Date
Mapping the Shortest Flight Path Across the Date Line in R Leaflet/Shiny, Using Gcintermediate
Adding All Elements of Two Lists
Creating a Grouped Bar Plot in R
How to Know a Function or an Operation in R Is Vectorized
Specifying the Colour Scale for Maps in Ggplot
Scraping Tables on Multiple Web Pages with Rvest in R
Highlight Areas Within Certain X Range in Ggplot2
Percentage of Overlap Between Polygons
Different Y-Limits on Ggplot Facet Grid Bar Graph
Plot Curved Lines Between Two Locations in Ggplot2