Applying a Function to a Backreference Within Gsub in R

Applying a function to a backreference within gsub in R

R does not have the option of applying a function directly to a match via gsub. You'll actually have to extract the match, transform the value, then replace the value. This is relativaly easy with the regmatches function. For example

x<-"(990283)M (31)O (29)M (6360)M"

f<-function(x) {
v<-as.numeric(substr(x,2,nchar(x)-1))
paste0(v+5,".1")
}

m <- gregexpr("\\(\\d+\\)", x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
# [1] "990288.1M 36.1O 34.1M 6365.1M"

Of course you can make f do whatever you like just make sure it's vector-friendly. Of course, you could wrap this in your own function

gsubf <- function(pattern, x, f) {
m <- gregexpr(pattern, x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
}
gsubf("\\(\\d+\\)", x, f)

Note that in these examples we're not using a capture group, we're just grabbing the entire match. There are ways to extract the capture groups but they are a bit messier. If you wanted to provide an example where such an extraction is required, I might be able to come up with something fancier.

Backreferences evaluation time in gsub

1) gsub replaces a pattern with a constant but what you are looking to do is to replace it with the result of applying a function to the matched string. gusbfn in the gsubfn package does that. Below, the formula in the second argument is just gsubfn's short form for a function whose argument is the left hand side and the body is the right hand side. Alternately the second argument could be expressed in the usual function notation ( function(x) nls[x,] ) but at the expense of a bit of verbosity:

> library(gsubfn)
> gsubfn("a|b|c", x ~ nls[x, ], "a + b*x + c*x^2")
[1] "1 + 2*x + 3*x^2"

Note that "a|b|c" could be derived from nls using paste(rownames(nls), collapse = "|") in order to avoid redundant specification.

2) Although gsubfn simplifies this significantly, to do it without gsubfn use substitute :

> L <- as.list(setNames(nls[[1]], rownames(nls)))  # L <- list(a = 1L, b = 2L, c = 3L)
> e <- parse(text = "a + b * x + c * x ^ 2")[[1]] # e is the text as a "call" object
> s <- do.call(substitute, list(e, L)) # perform the substitution
> format(s) # convert to character
[1] "1L + 2L * x + 3L * x^2"

The Ls are due to the fact that nls as defined in the question contains integers. Convert them to numeric before running the above if you don't like that:

nls[[1]] <- as.numeric(nls[[1]])

3) Another possibility is to loop over the strings to be substituted.

> s <- "a + b*x + c*x^2"
> for(nm in rownames(nls)) s <- gsub(nm, nls[nm, ], s)
> s
[1] "1 + 2*x + 3*x^2"

If we knew there was no more than one occurrence of each to be replaced we could use sub in place of gsub here.

UPDATE: Corrected second solution.

UPDATE 2: Added third solution.

More than 9 backreferences in gsub()

See Regular Expressions with The R Language:

You can use the backreferences \1 through \9 in the replacement text to reinsert text matched by a capturing group. There is no replacement text token for the overall match. Place the entire regex in a capturing group and then use \1.

But with PCRE you should be able to use named groups. So try (?P<name>regex) for groupd naming and (?P=name) as backreference.

How to do a replace with backreferences, when the number of occurences is unknown?

Cool problem - I got to learn a new trick with str_replace. You can make the return value a function, and it applies the function to the strings you've picked out.

replace_brakets <- function(str) {
str_replace_all(str, "\\}\\{", ",")
}

s %>% str_replace_all("(?<=\\\\autocites\\{)([:alnum:]+\\}\\{)+", replace_brakets)
# [1] "Text.\\autocites{REF1,REF2,REF3}. More text \\autocites{REF4,REF5} and \\begin{tabular}{ll}"

gsub return an empty string when no match is found

I'd probably go a different route, since the sapply doesn't seem necessary to me as these functions are vectorized already:

fun <- function(x){
ind <- grep(".*(Ref. (\\d+)).*",x,value = FALSE)
x <- gsub(".*(Ref. (\\d+)).*", "\\1", x)
x[-ind] <- ""
x
}

fun(data)

R: Gsub replacing pattern with skipping a character in replacement

In regex you can group with parenthesis and back-reference with \\1

data <- gsub('Huiswaard\\s(\\d)\\s>*', "Huiswaard-\\1-", df)
data
[1] "Huiswaard-2-Oost" "Huiswaard-1-Zuid" "Huiswaard-2-West"

If you want to change the suffix, you could also capture the second word with \\w+ which will capture 1 or more word characters after the space.:

data <- gsub('Huiswaard\\s(\\d)\\s\\w+', "Huiswaard-\\1-Oost", df)
data
[1] "Huiswaard-2-Oost" "Huiswaard-1-Oost" "Huiswaard-2-Oost"

I use this cheat sheet to help me understand regular expressions: https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf



Related Topics



Leave a reply



Submit