Gsub a Every Element After a Keyword in R

Gsub a every element after a keyword in R

Following simple sub may help you here.

sub("\\.string.*","",variable)

Explanation: Method of using sub

sub(regex_to_replace_text_in_variable,new_value,variable)

Difference between sub and gsub:

sub: is being used for performing substitution on variables.

gsub: gsub is being used for same substitution tasks only but only thing it will be perform substitution on ALL matches found though sub performs it only for first match found one.

From help page of R:

sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)

gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)

Looping gsub() replacing elements from a list of multiple words into a corpus

stringr::str_replace_all() can do this directly. That's what the help file is trying to ever-so-briefly communicate with "Vectorised over string, pattern and replacement."

Here I assume that your corpus is stored in a character vector, but it could be a list of characters, as well. If it's more complicated (e.g. it's in JSON...) then you might need to do some preprocessing before you feed it to str_replace_all().

Note that the result drops the names of the input elements, but it'd be easy to restore them.

library(tidyverse)

(ecb_corpus <- c(
  doc_1 = c("lorem ipsum interest rate gobbledygook"),
  doc_2 = c("lorem dolor central bank foobar")
))
#>                                    doc_1 
#> "lorem ipsum interest rate gobbledygook" 
#>                                    doc_2 
#>        "lorem dolor central bank foobar"

replacements <- c("euro_area",
                  "monetary_policy",
                  "price_stability",
                  "interest_rates",
                  "second_question",
                  "medium_term",
                  "first_question",
                  "central_banks",
                  "inflation_expectations",
                  "structural_reforms")

targets <- replacements %>% str_replace_all("_", " ") %>% str_remove("s$")

(replacement_pairs <- replacements %>% set_names(targets))
#>                euro area          monetary policy          price stability 
#>              "euro_area"        "monetary_policy"        "price_stability" 
#>            interest rate          second question              medium term 
#>         "interest_rates"        "second_question"            "medium_term" 
#>           first question             central bank    inflation expectation 
#>         "first_question"          "central_banks" "inflation_expectations" 
#>        structural reform 
#>     "structural_reforms"

(ecb_ready <- ecb_corpus %>% str_replace_all(replacement_pairs))
#> [1] "lorem ipsum interest_rates gobbledygook"
#> [2] "lorem dolor central_banks foobar"

^{Created on 2019-09-28 by the reprex package (v0.3.0)}

gsub and remove all characters between and in R

You can use

 gsub("<[^>]+>", "",a)
[1] "7 -5.067 -3 56.7 -3.3 58.3 -5.65 57 -8.33"

"<" and ">" are literals, "[^>]" matches any character that is not ">" and "+" allows for one or more matches. Using gsub repeats this match as many times as this pattern is found. The pattern is replaced by the empty string "".

How to gsub for matching strings and simultaneously remove non-matching strings?

Consider pasting several ifelse calls checking for specific strings:

dataframe$New <- paste(ifelse(grepl("Paris|New York|Tokyo|Washington DC|Copenhagen|Chicago", dataframe$Words), "City", "N/A"), 
                       ifelse(grepl("Panama|USA|Japan|Mexico|Israel|Brazil", dataframe$Words), "Country", "N/A"),
                       ifelse(grepl("Asia|Antarctica|Africa|North America|South America", dataframe$Words), "Continent", "N/A"),
                       sep=",")

dataframe$New <- gsub("N/A,|,N/A", "", dataframe$New)

dataframe

#   Color Letter                                                             Words                    New
# 1   red      A             Paris,Asia,parrot,Antarctica,North America,cat,lizard         City,Continent
# 2  blue      A               Panama,New York,Africa,dog,Tokyo,Washington DC,fish City,Country,Continent
# 3   red      B                   Copenhagen,bird,USA,Japan,Chicago,Mexico,insect           City,Country
# 4  blue      B Israel,Antarctica,horse,South America,North America,turtle,Brazil      Country,Continent

Or dryer version with do.call + lapply:

strs <- list(c("Paris|New York|Tokyo|Washington DC|Copenhagen|Chicago", "City"),
             c("Panama|USA|Japan|Mexico|Israel|Brazil", "Country"),
             c("Asia|Antarctica|Africa|North America|South America", "Continent"))

df$New2 <- do.call(paste,
                   c(lapply(strs, function(s) ifelse(grepl(s[1], df$Words), s[2], "N/A")), 
                     list(sep=",")))
df$New2 <- gsub("N/A,|,N/A", "", df$New2)

R: pass a vector of strings to replace all instances within a string

We can use gsubfn if we need to replace with numbers.

 library(gsubfn)
 gsubfn("\\w+", as.list(setNames(1:3, numlist)), mystring)
 #[1] "I have 1 cat, 2 dogs and 3 rabbits"

EDIT: I thought that we need to replace with numbers that corresponds to the words in 'numlist'. But, iff we need to replace with ##NUMBER## flag, one option is mgsub

 library(qdap)
 mgsub(numlist, "##NUMBER##", mystring)
 #[1] "I have ##NUMBER## cat, ##NUMBER## dogs and ##NUMBER## rabbits"

Remove part of string after .

You just need to escape the period:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")

gsub("\\..*","",a)
[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"

r gsub extract n words before and after a term

Instead of using space, try \\W{1,}:

gsub(".*(((\\W{1,})\\w{1,}){3} game((\\W{1,})\\w{1,}){3}).*", "\\1", a, perl = TRUE)

[1] " came for our game we were ready"       
" came for our game, but we were"        
" not here. Our game  was not completed"

Remove all text before colon

Here are two ways of doing it in R:

foo <- "TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1"

# Remove all before and up to ":":
gsub(".*:","",foo)

# Extract everything behind ":":
regmatches(foo,gregexpr("(?<=:).*",foo,perl=TRUE))

Use gsub remove all string before first white space in R

Try this:

sub(".*? ", "", D$name)

Edit:

The pattern is looking for any character zero or more times (.*) up until the first space, and then capturing the one or more characters ((.+)) after that first space. The ? after .* makes it "lazy" rather than "greedy" and is what makes it stop at the first space found. So, the .*? matches everything before the first space, the space matches the first space found.

Gsub a Every Element After a Keyword in R