Gsub() in R Is Not Replacing '.' (Dot)

gsub() in R is not replacing '.' (dot)

You may need to escape the . which is a special character that means "any character" (from @Mr Flick's comment)

 gsub('\\.', '-', x)
#[1] "2014-06-09"

Or

gsub('[.]', '-', x)
#[1] "2014-06-09"

Or as @Moix mentioned in the comments, we can also use fixed=TRUE instead of escaping the characters.

 gsub(".", "-", x, fixed = TRUE)

Replace dots using `gsub`

My recommendation would be to escape the "." character:

        spy$Identifier <- gsub("\\.", "/", spy$Identifier)

In regular expression, a period is a special character that matches any character. "Escaping" it tells the search to look for an actual period. In R's gsub this is accomplished with two backslashes (i.e.: "\\"). In other languages, it's often just one backslash.

Unexpected outcome, not replacing, in R out of a gsub function

The sub function doesn't work this way. One viable approach would be to capture the quantity you want, then use this capture group as the replacement:

x <- "r_con[C3-C3,Intercept]"
term <- sub("^r_con\\[([^,]+),Intercept\\]", "\\1", x)
term

[1] "C3-C3"

Replace the dot at the end of a string in R

Try :

x <- "DEL.Xp22.11..ZFX."
x <- gsub("..", ' (', x, fixed = T)
x <- gsub("\\.$", ')', x)

Here I use the regex anchor '$' to signify the end of the word. And '\' to escape the '.' that is a regex special character.

Why does gsub/sub not work to replace .. ?

We can use fixed = TRUE as . can match any character in the default regex mode if it is not escaped (\\.) or placed inside square brackets ([.]) or the faster option is fixed = TRUE

gsub("..", " ", rownames(df), fixed = TRUE)
#[1] "Saint.Petersburg Russia" "Istanbul Turkey"

Replacing a special character does not work with gsub

You have to escape the + symbol, as it is a regex command.

> gsub("Ã<U\\+009F>", "REPLACED", "Testing string Ã<U+009F> ")
[1] "Testing string REPLACED "

> gsub("â<U\\+0080><U\\+0093>", "REPLACED", "Testing string â<U+0080><U+0093> ")
[1] "Testing string REPLACED "

gsub() not working if I reference a column using a character vector?

gsub is being given a vector of strings, and it does what it knows: works on the strings. It doesn't know that they should be an indirect reference. (Nothing will know that it should be indirect.)

You have two options:

  1. The canonical way in data.table for this is likely to use .SDcols.

    preferences[, (cols) := lapply(.SD, gsub, pattern = "UN1", replacement = "A"), .SDcols = cols]
    preferences
    # Pref_1
    # <char>
    # 1: A
    # 2: Food and Agriculture Organization (F...
    # 3: United Nations Educational, Scientif...
    # 4: United Nations Development Programme...
    # 5: Commission on Narcotic Drugs (CND)
    # 6: Commission on Narcotic Drugs (CND)
    # 7: Human Rights Council (HRC)
    # 8: A
    # 9: Human Rights Council (HRC)
    # 10: A

    This does two things: (i) the use of .SDcols for iterating over a dynamic set of columns is preferred and faster, and allows programmatic determination of those columns (what you need); (ii) using lapply allows you to do this to one or more columns. If you know you'll always do just one column, this still works well with very little overhead.

  2. You can get/mget the data. This is the way to tell something to grab the contents of a variable identified in a string vector.

    If you know that you will always have exactly one column, then you can use get:

    preferences[, (cols) := gsub(get(cols), pattern = "UN1", replacement = "A")]

    If there is even a chance that you'll have more than one, I strongly recommend mget. (Even if you think you'll always have one, this is still safe.)

    preferences[, (cols) := lapply(mget(cols), gsub, pattern = "UN1", replacement = "A")]

Data

preferences <- setDT(structure(list(Pref_1 = c("UN1", "Food and Agriculture Organization (FAO)", "United Nations Educational, Scientific and Cultural Organization (UNESCO)", "United Nations Development Programme (UNDP)", "Commission on Narcotic Drugs (CND)", "Commission on Narcotic Drugs (CND)", "Human Rights Council (HRC)", "UN1", "Human Rights Council (HRC)", "UN1")), class = c("data.table", "data.frame"), row.names = c(NA, -10L)))
cols <- "Pref_1"

gsub() not recognizing and replacing certain accented characters

Use stringi::stri_trans_general:

library(stringi)
df<-data.frame(Name=c("Stipe Miočić","Duško Todorović","Michał Oleksiejczuk","Jiři Prochazka","Bartosz Fabiński","Damir Hadžović","Ľudovit Klein","Diana Belbiţă","Joanna Jędrzejczyk" ))
stri_trans_general(df$Name, "Latin-ASCII")

Results:

[1] "Stipe Miocic"        "Dusko Todorovic"     "Michal Oleksiejczuk"
[4] "Jiri Prochazka" "Bartosz Fabinski" "Damir Hadzovic"
[7] "Ludovit Klein" "Diana Belbita" "Joanna Jedrzejczyk"

See R proof.

Replacing dots with underscores, when using make.names or renaming obejcts in the working environment

The issue is probably that use you use "." which in a regex matches every character. If you want to match a . in a string you have to escape it using use "\\.".

Personally I don't like it to wrangle all code in one line when you could use a simple function to make the code cleaner and more understandable.

# Example data
write.csv(mtcars, "mt cars.csv")
write.csv(mtcars, "mt car s.csv")

temp = list.files(pattern="*.csv")

make_names <- function(x) {
gsub("\\.", "_", make.names(gsub("*.csv$", "", x)))
}
names(temp) <- make_names(temp)

list2env(lapply(temp, read.csv), envir = .GlobalEnv)
#> <environment: R_GlobalEnv>

ls()
#> [1] "make_names" "mt_car_s" "mt_cars" "temp"


Related Topics



Leave a reply



Submit