R - How to Replace Parts of Variable Strings Within Data Frame

R - how to replace parts of variable strings within data frame

Use gsub

dat <- c("test", "testing", "esten", "etsen", "blest", "estten")

gsub("t", "", dat)
[1] "es" "esing" "esen" "esen" "bles" "esen"

Replace specific characters in a variable in data frame in R

You can use the special groups [:punct:] and [:space:] inside of a pattern group ([...]) like this:

df <- data.frame(
DMA.NAME = c(
"Columbus, OH",
"Orlando-Daytona Bch-Melbrn",
"Boston (Manchester)",
"Columbus, OH",
"Orlando-Daytona Bch-Melbrn",
"Minneapolis-St. Paul"),
stringsAsFactors=F)
##
> gsub("[[:punct:][:space:]]+","\\.",df$DMA.NAME)
[1] "Columbus.OH" "Orlando.Daytona.Bch.Melbrn" "Boston.Manchester." "Columbus.OH"
[5] "Orlando.Daytona.Bch.Melbrn" "Minneapolis.St.Paul"

Replace all occurrences of a string in a data frame

If you are only looking to replace all occurrences of "< " (with space) with "<" (no space), then you can do an lapply over the data frame, with a gsub for replacement:

> data <- data.frame(lapply(data, function(x) {
+ gsub("< ", "<", x)
+ }))
> data
name var1 var2
1 a <2 <3
2 a <2 <3
3 a <2 <3
4 b <2 <3
5 b <2 <3
6 b <2 <3
7 c <2 <3
8 c <2 <3
9 c <2 <3

Replace multiple strings in a column of a data frame

You can do the following to add as many pattern-replacement pairs as you want in one line.

library(stringr)

vec <- c("Absent", "Absent", "Present", "Present", "XX", "YY", "ZZ")

str_replace_all(vec, c("Absent" = "A", "Present" = "P"))
# [1] "A" "A" "P" "P" "XX" "YY" "ZZ"

Replace specific characters within strings

With a regular expression and the function gsub():

group <- c("12357e", "12575e", "197e18", "e18947")
group
[1] "12357e" "12575e" "197e18" "e18947"

gsub("e", "", group)
[1] "12357" "12575" "19718" "18947"

What gsub does here is to replace each occurrence of "e" with an empty string "".


See ?regexp or gsub for more help.

How to replace part of a string by position

Another option is sub, if you're not certain that the first XXX will always start at position 10:

sub("XXX", "000", "Jun:2020,XXX/XXX|May:2020,035/XXX|Apr:2020,040/XXX|")
# [1] "Jun:2020,000/XXX|May:2020,035/XXX|Apr:2020,040/XXX|"

How to remove part of characters in data frame column

There are multiple ways of doing this:

  1. Using as.numeric on a column of your choice.
raw$Zipcode <- as.numeric(raw$Zipcode)

  1. If you want it to be a character then you can use stringr package.
library(stringr)
raw$Zipcode <- str_replace(raw$Zipcode, "^0+" ,"")

  1. There is another function called str_remove in stringr package.
raw$Zipcode <- str_remove(raw$Zipcode, "^0+")

  1. You can also use sub from base R.
raw$Zipcode <- sub("^0+", "", raw$Zipcode)

But if you want to remove n number of leading zeroes, replace + with {n} to remove them.

For instance to remove two 0's use sub("^0{2}", "", raw$Zipcode).



Related Topics



Leave a reply



Submit