Conditional Gsub Replacement

Conditional gsub replacement

Maybe this, inspired by Josh O'Brien's answer, does it:

x <- "I like 346 ice cream cones.  They're 99 percent good!  I ate 46."
numDF <- structure(c("346", "99", "46", "three hundred forty six", "ninety nine",
"forty six"), .Dim = c(3L, 2L), .Dimnames = list(c("1", "2",
"3"), c("symbol", "text")))

pat <- paste(numDF[,"symbol"], collapse="|")
repeat {
m <- regexpr(pat, x)
if(m==-1) break
sym <- regmatches(x,m)
regmatches(x,m) <- numDF[match(sym, numDF[,"symbol"]), "text"]
}
x

Ruby: Conditional replace in String using gsub

#gsub can optionally take a block and will replace with the result of that block:

subject.gsub(/\d+/) { |m| m == '1' ? m : '5' }

Conditional multiple pattern replacement with gsub in R

Why not just use paste here:

tbl$col1 <- paste0(tbl$col1, "Y")

Note that the above would convert col1 to character, which may or may not be acceptable to you. Also, I might even recommend not doing this transformation of col1. Rather, consider just keeping your original numeric data as is, and then use paste if you want to view that data a certain way.

We could also use sub here:

tbl$col1 <- sub("$", "Y", tbl$col1)

Conditional gsub action

It is not quite correct, gsub() does not return the matched phrases on its own. It just returns the count of substitutions made. Your problem is dealing with how to store the matching group for subsequent string replacement.

The problem with your attempt is the regexp matched within /../ is not stored explicitly, you need to make it be stored by using match() or index() and use that in the replacement part,

awk '
match($0, /(^|[^[:alpha:]])[[:digit:]]{2}[[:space:]]{1,}[[:alpha:]]{3,8}[[:space:]]{1,}[[:digit:]]{4}([^[:alpha:]]|$)/) {
str=substr($0, RSTART, RLENGTH); sub(str," ",$0 );
}1' file

The example above would replace the captured group i.e. your date strings below and replace them with a single white space.

 16 Ottobre 2018
17 ottabre 2017
18 ott 2020

One could use sub() or gsub() depending on the number of occurrences of the regex in the line. Applying the command above would remove the those date strings from the file and produce a result as below.

ci sono 4444444444444Quattro mele
sentiamoci il
deciIIIIIIdiamo il
Manipolo di eroi 55555555555
17 mele
llllllLLLLLLLLLLLL
una mela e mezza
2 mAAAeleA
0000 asd a0 0 ad000

Notice the {..}1 after we do the string replace. It is needed to reconstruct the line after the appropriate replacements are done.

Putting it in awk script it would look like

#!/usr/bin/awk -f

match($0, /(^|[^[:alpha:]])[[:digit:]]{2}[[:space:]]{1,}[[:alpha:]]{3,8}[[:space:]]{1,}[[:digit:]]{4}([^[:alpha:]]|$)/) {
str=substr($0, RSTART, RLENGTH)
sub(str," ",$0 )
}1

Conditional replacement of characters in a string pursuant to the use certain tags

Using lookarounds works too:

sub("(?<!<)Brasil(?!>)", "Brazil", text, perl = TRUE)

How this works:

  • (?<!<)- negative lookbehind to assert that the next character to the left must not be a literal <
  • Brasil - the literal string Brasil
  • (?!>)- negative lookahead to assert that the next character to the right must not be literal >

Note that if you have a single replacement per string then sub suffices. If there are more than one replacements to be made, then use gsub.

Conditional replacement of a string in data frame

You need to use the ifelse() function.

DF$ID <- ifelse(DF$INT == 1,  gsub("^9", "8", DF$ID), DF$ID)

Using dplyr:

DF %>% 
mutate(ID=ifelse(INT==1,gsub("^9","8",ID),ID))

This will run the gsub on the rows where DF$INT == 1, and if it's not 1 then it will remain the same.

The if() function that you used:

if(DF$INT == "1") { }

is not intended to work on data.frames. The if() function is used only to check if something (like a statement) is TRUE or FALSE. For example:

if(use_new_function == "on"){ 
run_new_function()
}

R conditional logic to replace a character in a string, based on the preceeding and following characters in the string

I worked out the conditional logic needed to address the substitution of underscores within the loops that I proposed above:

convert_compound_names<-function(x){

for(i in c(1:length(x))){
split_name<-unlist(strsplit(x[i],""))
for (j in c(1:length(split_name))){
#some conditional logic to replace underscores here
if(split_name[j]=="_"){
if(grepl("\\d",split_name[j-1])|(grepl("\\d",split_name[j+1]))){split_name[j]<-"-"}
else if(grepl("-",split_name[j-1])|(grepl("-",split_name[j+1]))){split_name[j]<-""}
else if(grepl("[a-zA-Z]",split_name[j-1])&&(grepl("[a-zA-Z]",split_name[j+1]))){split_name[j]<-" "}
}
}
x[i]<-paste0(split_name[1:length(split_name)],collapse="")
}
return(x)
}

However I'm sure there's a more straightforward way of doing this to be found.



Related Topics



Leave a reply



Submit