Gsub a every element after a keyword in R
Following simple sub
may help you here.
sub("\\.string.*","",variable)
Explanation: Method of using sub
sub(regex_to_replace_text_in_variable,new_value,variable)
Difference between sub
and gsub
:
sub
: is being used for performing substitution on variables.
gsub
: gsub
is being used for same substitution tasks only but only thing it will be perform substitution on ALL matches found though sub
performs it only for first match found one.
From help page of R
:
sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
Looping gsub() replacing elements from a list of multiple words into a corpus
stringr::str_replace_all()
can do this directly. That's what the help file is trying to ever-so-briefly communicate with "Vectorised over string
, pattern
and replacement
."
Here I assume that your corpus is stored in a character vector, but it could be a list of characters, as well. If it's more complicated (e.g. it's in JSON...) then you might need to do some preprocessing before you feed it to str_replace_all()
.
Note that the result drops the names of the input elements, but it'd be easy to restore them.
library(tidyverse)
(ecb_corpus <- c(
doc_1 = c("lorem ipsum interest rate gobbledygook"),
doc_2 = c("lorem dolor central bank foobar")
))
#> doc_1
#> "lorem ipsum interest rate gobbledygook"
#> doc_2
#> "lorem dolor central bank foobar"
replacements <- c("euro_area",
"monetary_policy",
"price_stability",
"interest_rates",
"second_question",
"medium_term",
"first_question",
"central_banks",
"inflation_expectations",
"structural_reforms")
targets <- replacements %>% str_replace_all("_", " ") %>% str_remove("s$")
(replacement_pairs <- replacements %>% set_names(targets))
#> euro area monetary policy price stability
#> "euro_area" "monetary_policy" "price_stability"
#> interest rate second question medium term
#> "interest_rates" "second_question" "medium_term"
#> first question central bank inflation expectation
#> "first_question" "central_banks" "inflation_expectations"
#> structural reform
#> "structural_reforms"
(ecb_ready <- ecb_corpus %>% str_replace_all(replacement_pairs))
#> [1] "lorem ipsum interest_rates gobbledygook"
#> [2] "lorem dolor central_banks foobar"
Created on 2019-09-28 by the reprex package (v0.3.0)
gsub and remove all characters between and in R
You can use
gsub("<[^>]+>", "",a)
[1] "7 -5.067 -3 56.7 -3.3 58.3 -5.65 57 -8.33"
"<" and ">" are literals, "[^>]" matches any character that is not ">" and "+" allows for one or more matches. Using gsub
repeats this match as many times as this pattern is found. The pattern is replaced by the empty string "".
How to gsub for matching strings and simultaneously remove non-matching strings?
Consider pasting several ifelse
calls checking for specific strings:
dataframe$New <- paste(ifelse(grepl("Paris|New York|Tokyo|Washington DC|Copenhagen|Chicago", dataframe$Words), "City", "N/A"),
ifelse(grepl("Panama|USA|Japan|Mexico|Israel|Brazil", dataframe$Words), "Country", "N/A"),
ifelse(grepl("Asia|Antarctica|Africa|North America|South America", dataframe$Words), "Continent", "N/A"),
sep=",")
dataframe$New <- gsub("N/A,|,N/A", "", dataframe$New)
dataframe
# Color Letter Words New
# 1 red A Paris,Asia,parrot,Antarctica,North America,cat,lizard City,Continent
# 2 blue A Panama,New York,Africa,dog,Tokyo,Washington DC,fish City,Country,Continent
# 3 red B Copenhagen,bird,USA,Japan,Chicago,Mexico,insect City,Country
# 4 blue B Israel,Antarctica,horse,South America,North America,turtle,Brazil Country,Continent
Or dryer version with do.call
+ lapply
:
strs <- list(c("Paris|New York|Tokyo|Washington DC|Copenhagen|Chicago", "City"),
c("Panama|USA|Japan|Mexico|Israel|Brazil", "Country"),
c("Asia|Antarctica|Africa|North America|South America", "Continent"))
df$New2 <- do.call(paste,
c(lapply(strs, function(s) ifelse(grepl(s[1], df$Words), s[2], "N/A")),
list(sep=",")))
df$New2 <- gsub("N/A,|,N/A", "", df$New2)
R: pass a vector of strings to replace all instances within a string
We can use gsubfn
if we need to replace with numbers.
library(gsubfn)
gsubfn("\\w+", as.list(setNames(1:3, numlist)), mystring)
#[1] "I have 1 cat, 2 dogs and 3 rabbits"
EDIT: I thought that we need to replace with numbers that corresponds to the words in 'numlist'. But, iff we need to replace with ##NUMBER##
flag, one option is mgsub
library(qdap)
mgsub(numlist, "##NUMBER##", mystring)
#[1] "I have ##NUMBER## cat, ##NUMBER## dogs and ##NUMBER## rabbits"
Remove part of string after .
You just need to escape the period:
a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")
gsub("\\..*","",a)
[1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281" "NM_011419" "NM_053155"
r gsub extract n words before and after a term
Instead of using space, try \\W{1,}
:
gsub(".*(((\\W{1,})\\w{1,}){3} game((\\W{1,})\\w{1,}){3}).*", "\\1", a, perl = TRUE)
[1] " came for our game we were ready"
" came for our game, but we were"
" not here. Our game was not completed"
Remove all text before colon
Here are two ways of doing it in R:
foo <- "TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1"
# Remove all before and up to ":":
gsub(".*:","",foo)
# Extract everything behind ":":
regmatches(foo,gregexpr("(?<=:).*",foo,perl=TRUE))
Use gsub remove all string before first white space in R
Try this:
sub(".*? ", "", D$name)
Edit:
The pattern is looking for any character zero or more times (.*
) up until the first space, and then capturing the one or more characters ((.+)
) after that first space. The ?
after .*
makes it "lazy" rather than "greedy" and is what makes it stop at the first space found. So, the .*?
matches everything before the first space, the space matches the first space found.
Related Topics
Multi-Row X-Axis Labels in Ggplot Line Chart
Remove Unwanted Symbols from Expression Function - R
How to Declare a Vector of Zeros in R
How to Replace Na Values With Zeros in an R Dataframe
How to Trim Leading and Trailing White Space
Replace Values in a Dataframe Based on Lookup Table
Split a Large Dataframe into a List of Data Frames Based on Common Value in Column
How to Plot All the Columns of a Data Frame in R
Generating All Distinct Permutations of a List in R
Delete Rows Containing Specific Strings in R
Conditional Replacement of a Comma With a Dot in a Numeric Column
Counting Unique Values Across Variables (Columns) in R
How to View the Source Code For a Function
Why Does Summarize or Mutate Not Work With Group_By When I Load 'Plyr' After 'Dplyr'
Add Count of Unique/Distinct Values by Group to the Original Data
Geographic/Geospatial Distance Between 2 Lists of Lat/Lon Points (Coordinates)
Data.Table Objects Assigned With := from Within Function Not Printed