Applying a function to a backreference within gsub in R
R does not have the option of applying a function directly to a match via gsub
. You'll actually have to extract the match, transform the value, then replace the value. This is relativaly easy with the regmatches
function. For example
x<-"(990283)M (31)O (29)M (6360)M"
f<-function(x) {
v<-as.numeric(substr(x,2,nchar(x)-1))
paste0(v+5,".1")
}
m <- gregexpr("\\(\\d+\\)", x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
# [1] "990288.1M 36.1O 34.1M 6365.1M"
Of course you can make f
do whatever you like just make sure it's vector-friendly. Of course, you could wrap this in your own function
gsubf <- function(pattern, x, f) {
m <- gregexpr(pattern, x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
}
gsubf("\\(\\d+\\)", x, f)
Note that in these examples we're not using a capture group, we're just grabbing the entire match. There are ways to extract the capture groups but they are a bit messier. If you wanted to provide an example where such an extraction is required, I might be able to come up with something fancier.
Backreferences evaluation time in gsub
1) gsub
replaces a pattern with a constant but what you are looking to do is to replace it with the result of applying a function to the matched string. gusbfn
in the gsubfn package does that. Below, the formula in the second argument is just gsubfn's short form for a function whose argument is the left hand side and the body is the right hand side. Alternately the second argument could be expressed in the usual function notation ( function(x) nls[x,]
) but at the expense of a bit of verbosity:
> library(gsubfn)
> gsubfn("a|b|c", x ~ nls[x, ], "a + b*x + c*x^2")
[1] "1 + 2*x + 3*x^2"
Note that "a|b|c"
could be derived from nls
using paste(rownames(nls), collapse = "|")
in order to avoid redundant specification.
2) Although gsubfn
simplifies this significantly, to do it without gsubfn
use substitute
:
> L <- as.list(setNames(nls[[1]], rownames(nls))) # L <- list(a = 1L, b = 2L, c = 3L)
> e <- parse(text = "a + b * x + c * x ^ 2")[[1]] # e is the text as a "call" object
> s <- do.call(substitute, list(e, L)) # perform the substitution
> format(s) # convert to character
[1] "1L + 2L * x + 3L * x^2"
The L
s are due to the fact that nls
as defined in the question contains integers. Convert them to numeric before running the above if you don't like that:
nls[[1]] <- as.numeric(nls[[1]])
3) Another possibility is to loop over the strings to be substituted.
> s <- "a + b*x + c*x^2"
> for(nm in rownames(nls)) s <- gsub(nm, nls[nm, ], s)
> s
[1] "1 + 2*x + 3*x^2"
If we knew there was no more than one occurrence of each to be replaced we could use sub
in place of gsub
here.
UPDATE: Corrected second solution.
UPDATE 2: Added third solution.
More than 9 backreferences in gsub()
See Regular Expressions with The R Language:
You can use the backreferences
\1
through\9
in the replacement text to reinsert text matched by a capturing group. There is no replacement text token for the overall match. Place the entire regex in a capturing group and then use\1
.
But with PCRE you should be able to use named groups. So try (?P<
name
>
regex
)
for groupd naming and (?P=
name
)
as backreference.
How to do a replace with backreferences, when the number of occurences is unknown?
Cool problem - I got to learn a new trick with str_replace
. You can make the return value a function, and it applies the function to the strings you've picked out.
replace_brakets <- function(str) {
str_replace_all(str, "\\}\\{", ",")
}
s %>% str_replace_all("(?<=\\\\autocites\\{)([:alnum:]+\\}\\{)+", replace_brakets)
# [1] "Text.\\autocites{REF1,REF2,REF3}. More text \\autocites{REF4,REF5} and \\begin{tabular}{ll}"
gsub return an empty string when no match is found
I'd probably go a different route, since the sapply
doesn't seem necessary to me as these functions are vectorized already:
fun <- function(x){
ind <- grep(".*(Ref. (\\d+)).*",x,value = FALSE)
x <- gsub(".*(Ref. (\\d+)).*", "\\1", x)
x[-ind] <- ""
x
}
fun(data)
R: Gsub replacing pattern with skipping a character in replacement
In regex you can group with parenthesis and back-reference with \\1
data <- gsub('Huiswaard\\s(\\d)\\s>*', "Huiswaard-\\1-", df)
data
[1] "Huiswaard-2-Oost" "Huiswaard-1-Zuid" "Huiswaard-2-West"
If you want to change the suffix, you could also capture the second word with \\w+
which will capture 1 or more word characters after the space.:
data <- gsub('Huiswaard\\s(\\d)\\s\\w+', "Huiswaard-\\1-Oost", df)
data
[1] "Huiswaard-2-Oost" "Huiswaard-1-Oost" "Huiswaard-2-Oost"
I use this cheat sheet to help me understand regular expressions: https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf
Related Topics
How to Jitter Both Geom_Line and Geom_Point by the Same Magnitude
How to Read Geojson or Topojson File in R to Draw a Choropleth Map
Should I Avoid Programming Packages with Pipe Operators
Assigning by Reference into Loaded Package Datasets
In R, How to Check If Two Variable Names Reference the Same Underlying Object
R: Calculate Cosine Distance from a Term-Document Matrix with Tm and Proxy
Overlay Geom_Points() on Geom_Boxplot(Fill=Group)
How to Filter Data Frame with Conditions of Two Columns
Legends for Multiple Fills in Ggplot
Update a Column of Nas in One Data Table with the Value from a Column in Another Data Table
Dual Y Axis in Ggplot2 for Multiple Panel Figure
Create Url Hyperlink in R Shiny
How to Train a Ml Model in Sparklyr and Predict New Values on Another Dataframe
Number Format, Writing 1E-5 Instead of 0.00001
Different Axis Limits Per Facet in Ggplot2