Remove Parentheses and Text Within from Strings in R

Remove parentheses and text within from strings in R

A gsub should work here

gsub("\\s*\\([^\\)]+\\)","",as.character(companies$Name))
# or using "raw" strings as of R 4.0
gsub(r"{\s*\([^\)]+\)}","",as.character(companies$Name))

# [1] "Company A Inc" "Company B" "Company C Inc."
# [4] "Company D Inc." "Company E"

Here we just replace occurrences of "(...)" with nothing (also removing any leading space). R makes it look worse than it is with all the escaping we have to do for the parenthesis since they are special characters in regular expressions.

Removing parenthesis in R

These are metacharacters that either needs to be escaped (with \\) or we can place it in a square bracket to read it as character.

gsub("[()]", "", x)
#[1] "40.703707008, -73.943257966"

r - How can I remove a single pair of parentheses with text in a character?

Use sub :

sub('\\(.*?\\)\\s', '', value)
#[1] "This is a (keep) test sentence."
  • () are metacharacters and need to be escaped with \\.

  • .*? is to match as few characters possible till a closing bracket ()) is encountered.

How to remove square parentheses and text within from strings in R

I would use:

input <- c("6.77[9]", "5.92[10]", "2.98[103]")
gsub("\\[.*?\\]", "", input)

[1] "6.77" "5.92" "2.98"

The regex pattern \[.*?\] should match any quoted terms in square brackets, and using gsub would tell R to replace all such terms.

How to remove parenthesis and inside text in r?

The following two regexp solve the two problems in the question.

s <- "species name(2) V1"

sub("(^[^(]*)\\(.*$", "\\1", s)
#[1] "species name"

sub("\\([^)]*\\)", "", s)
#[1] "species name V1"

Now apply them to the column of interest.

Mutate to remove all parenthesis (and contents) from string in R

Another trick is:

my_order <- c("CD68", "PD-1", "FoxP3", "CD8", "PD-L1", "PanCK")
test %>%
mutate(prototype = gsub('\\s*[(][^)]+[)]','',Class),
ordered = map_chr(strsplit(prototype, '\\s*:\\s*'),
~str_c(sort(ordered(.x,my_order), decreasing = TRUE), collapse = ":")))
Class prototype ordered
1 FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780) FoxP3: PanCK: PD-1: CD68 PanCK:FoxP3:PD-1:CD68
2 CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520) CD8: PanCK: CD68: PD-L1 PanCK:PD-L1:CD8:CD68
3 PanCK (Opal 690): CD68 (Opal 780) PanCK: CD68 PanCK:CD68
4 FoxP3 (Opal 570): PanCK (Opal 690) FoxP3: PanCK PanCK:FoxP3

Removing text contained in brackets/parentheses from corpus (R)

You can remove all texts in brackets using gsub(). As you plan to remove the punctuation in a next step, you can replace them with ., just to indicate where something was taken (if you need to debug the pipeline) or you can replace them with an empty string "".

Your regex would not work. You need to escape the brackets with double back-slashes and you will want to remove multiple, but as few as possible, characters. You'll need the regex *? for the contents of the brackets:

corp = c("This is an example (or demonstration) of replacing things in brackets",
"Just use gsub (a function in base) to remove (or better replace) these elements")

corp = gsub("\\(.*?\\)",".",corp)

The example above would result in the vector:

> corp
[1] "This is an example . of replacing things in brackets"
[2] "Just use gsub . to remove . these elements"

Depending on the package you use for your corpus, you can do this with the character vector before converting it to a corpus or you can use specific mapping functions (e.g. tm_map() in tm) to apply it to all texts.

stringr: Removing Parentheses and Brackets from string

We can use |

gsub("\\)|\\]", "", Test)
#[1] "-0.158" "0.426" "1.01" "1.6" "2.18" "2.77"

or instead of escaping place the brackets inside the []

gsub("[][()]", "", Test)
#[1] "-0.158" "0.426" "1.01" "1.6" "2.18" "2.77"

If we want to do the extract instead of removing use either gregexpr/regmatches from base R or str_extract from stringr to check for patterns where a number could start with - and include .

library(stringr)
str_extract(Test, "-?[0-9.]+")
#[1] "-0.158" "0.426" "1.01" "1.6" "2.18" "2.77"

Difficulty to remove several parentheses in a string, using stringr, in R

We can use str_remove_all instead of str_remove as this matches only the first instance

library(stringr)
str_remove_all(x, "[()]")
#[1] "example"

replace text within parenthesis in R

Yes, use gsub() to replace all the text you don't want with an empty string.

x <- "Keep me (Remove Me 1). Again keep me (Remove Me 2). Again again keep me (Remove Me 3)."

Here is the regex you want:

gsub( " *\\(.*?\\) *", "", x)
[1] "Keep me. Again keep me. Again again keep me."

It works like this:

  • *? finds 0 or more spaces before (and after) the parentheses.
  • Since ( and ) are special symbols in a regex, you need to escape these, i.e. (\\(
  • The .*? is a wildcard find to find all characters, where the ? means to find in a non-greedy way. This is necessary because regex is greedy by default. In other words, by default the regex will start the match at the first opening parentheses and ends the match at the last closing parentheses.


Related Topics



Leave a reply



Submit