Conditional gsub replacement
Maybe this, inspired by Josh O'Brien's answer, does it:
x <- "I like 346 ice cream cones. They're 99 percent good! I ate 46."
numDF <- structure(c("346", "99", "46", "three hundred forty six", "ninety nine",
"forty six"), .Dim = c(3L, 2L), .Dimnames = list(c("1", "2",
"3"), c("symbol", "text")))
pat <- paste(numDF[,"symbol"], collapse="|")
repeat {
m <- regexpr(pat, x)
if(m==-1) break
sym <- regmatches(x,m)
regmatches(x,m) <- numDF[match(sym, numDF[,"symbol"]), "text"]
}
x
Ruby: Conditional replace in String using gsub
#gsub
can optionally take a block and will replace with the result of that block:
subject.gsub(/\d+/) { |m| m == '1' ? m : '5' }
Conditional multiple pattern replacement with gsub in R
Why not just use paste
here:
tbl$col1 <- paste0(tbl$col1, "Y")
Note that the above would convert col1
to character, which may or may not be acceptable to you. Also, I might even recommend not doing this transformation of col1
. Rather, consider just keeping your original numeric data as is, and then use paste
if you want to view that data a certain way.
We could also use sub
here:
tbl$col1 <- sub("$", "Y", tbl$col1)
Conditional gsub action
It is not quite correct, gsub()
does not return the matched phrases on its own. It just returns the count of substitutions made. Your problem is dealing with how to store the matching group for subsequent string replacement.
The problem with your attempt is the regexp matched within /../
is not stored explicitly, you need to make it be stored by using match()
or index()
and use that in the replacement part,
awk '
match($0, /(^|[^[:alpha:]])[[:digit:]]{2}[[:space:]]{1,}[[:alpha:]]{3,8}[[:space:]]{1,}[[:digit:]]{4}([^[:alpha:]]|$)/) {
str=substr($0, RSTART, RLENGTH); sub(str," ",$0 );
}1' file
The example above would replace the captured group i.e. your date strings below and replace them with a single white space.
16 Ottobre 2018
17 ottabre 2017
18 ott 2020
One could use sub()
or gsub()
depending on the number of occurrences of the regex in the line. Applying the command above would remove the those date strings from the file and produce a result as below.
ci sono 4444444444444Quattro mele
sentiamoci il
deciIIIIIIdiamo il
Manipolo di eroi 55555555555
17 mele
llllllLLLLLLLLLLLL
una mela e mezza
2 mAAAeleA
0000 asd a0 0 ad000
Notice the {..}1
after we do the string replace. It is needed to reconstruct the line after the appropriate replacements are done.
Putting it in awk
script it would look like
#!/usr/bin/awk -f
match($0, /(^|[^[:alpha:]])[[:digit:]]{2}[[:space:]]{1,}[[:alpha:]]{3,8}[[:space:]]{1,}[[:digit:]]{4}([^[:alpha:]]|$)/) {
str=substr($0, RSTART, RLENGTH)
sub(str," ",$0 )
}1
Conditional replacement of characters in a string pursuant to the use certain tags
Using lookarounds works too:
sub("(?<!<)Brasil(?!>)", "Brazil", text, perl = TRUE)
How this works:
(?<!<)
- negative lookbehind to assert that the next character to the left must not be a literal<
Brasil
- the literal stringBrasil
(?!>)
- negative lookahead to assert that the next character to the right must not be literal>
Note that if you have a single replacement per string then sub
suffices. If there are more than one replacements to be made, then use gsub
.
Conditional replacement of a string in data frame
You need to use the ifelse()
function.
DF$ID <- ifelse(DF$INT == 1, gsub("^9", "8", DF$ID), DF$ID)
Using dplyr
:
DF %>%
mutate(ID=ifelse(INT==1,gsub("^9","8",ID),ID))
This will run the gsub
on the rows where DF$INT == 1
, and if it's not 1 then it will remain the same.
The if()
function that you used:
if(DF$INT == "1") { }
is not intended to work on data.frame
s. The if()
function is used only to check if something (like a statement) is TRUE or FALSE. For example:
if(use_new_function == "on"){
run_new_function()
}
R conditional logic to replace a character in a string, based on the preceeding and following characters in the string
I worked out the conditional logic needed to address the substitution of underscores within the loops that I proposed above:
convert_compound_names<-function(x){
for(i in c(1:length(x))){
split_name<-unlist(strsplit(x[i],""))
for (j in c(1:length(split_name))){
#some conditional logic to replace underscores here
if(split_name[j]=="_"){
if(grepl("\\d",split_name[j-1])|(grepl("\\d",split_name[j+1]))){split_name[j]<-"-"}
else if(grepl("-",split_name[j-1])|(grepl("-",split_name[j+1]))){split_name[j]<-""}
else if(grepl("[a-zA-Z]",split_name[j-1])&&(grepl("[a-zA-Z]",split_name[j+1]))){split_name[j]<-" "}
}
}
x[i]<-paste0(split_name[1:length(split_name)],collapse="")
}
return(x)
}
However I'm sure there's a more straightforward way of doing this to be found.
Related Topics
Scraping from Aspx Website Using R
Plotting Average of Multiple Variables in Time-Series Using Ggplot
Using R to Fit a Sigmoidal Curve
Access Data.Table Columns with Strings
How Does One Turn Contour Lines into Filled Contours
Create All Possible Combiations of 0,1, or 2 "1"S of a Binary Vector of Length N
Align Edges of Ggplot Choropleth (Legend Title Varies)
Plotting Continuous and Discrete Series in Ggplot with Facet
R: How to Select Files in Directory Which Satisfy Conditions Both on the Beginning and End of Name
Clear Memory Allocated by R Session (Gc() Doesnt Help !)
How to Replace Numeric Codes with Value Labels from a Lookup Table
Convert String Date to R Date Fast for All Dates