R - how to replace parts of variable strings within data frame
Use gsub
dat <- c("test", "testing", "esten", "etsen", "blest", "estten")
gsub("t", "", dat)
[1] "es" "esing" "esen" "esen" "bles" "esen"
Replace specific characters in a variable in data frame in R
You can use the special groups [:punct:]
and [:space:]
inside of a pattern group ([...]
) like this:
df <- data.frame(
DMA.NAME = c(
"Columbus, OH",
"Orlando-Daytona Bch-Melbrn",
"Boston (Manchester)",
"Columbus, OH",
"Orlando-Daytona Bch-Melbrn",
"Minneapolis-St. Paul"),
stringsAsFactors=F)
##
> gsub("[[:punct:][:space:]]+","\\.",df$DMA.NAME)
[1] "Columbus.OH" "Orlando.Daytona.Bch.Melbrn" "Boston.Manchester." "Columbus.OH"
[5] "Orlando.Daytona.Bch.Melbrn" "Minneapolis.St.Paul"
Replace all occurrences of a string in a data frame
If you are only looking to replace all occurrences of "< "
(with space) with "<"
(no space), then you can do an lapply
over the data frame, with a gsub
for replacement:
> data <- data.frame(lapply(data, function(x) {
+ gsub("< ", "<", x)
+ }))
> data
name var1 var2
1 a <2 <3
2 a <2 <3
3 a <2 <3
4 b <2 <3
5 b <2 <3
6 b <2 <3
7 c <2 <3
8 c <2 <3
9 c <2 <3
Replace multiple strings in a column of a data frame
You can do the following to add as many pattern-replacement pairs as you want in one line.
library(stringr)
vec <- c("Absent", "Absent", "Present", "Present", "XX", "YY", "ZZ")
str_replace_all(vec, c("Absent" = "A", "Present" = "P"))
# [1] "A" "A" "P" "P" "XX" "YY" "ZZ"
Replace specific characters within strings
With a regular expression and the function gsub()
:
group <- c("12357e", "12575e", "197e18", "e18947")
group
[1] "12357e" "12575e" "197e18" "e18947"
gsub("e", "", group)
[1] "12357" "12575" "19718" "18947"
What gsub
does here is to replace each occurrence of "e"
with an empty string ""
.
See ?regexp
or gsub
for more help.
How to replace part of a string by position
Another option is sub
, if you're not certain that the first XXX will always start at position 10:
sub("XXX", "000", "Jun:2020,XXX/XXX|May:2020,035/XXX|Apr:2020,040/XXX|")
# [1] "Jun:2020,000/XXX|May:2020,035/XXX|Apr:2020,040/XXX|"
How to remove part of characters in data frame column
There are multiple ways of doing this:
- Using
as.numeric
on a column of your choice.
raw$Zipcode <- as.numeric(raw$Zipcode)
- If you want it to be a
character
then you can usestringr
package.
library(stringr)
raw$Zipcode <- str_replace(raw$Zipcode, "^0+" ,"")
- There is another function called
str_remove
instringr
package.
raw$Zipcode <- str_remove(raw$Zipcode, "^0+")
- You can also use
sub
from base R.
raw$Zipcode <- sub("^0+", "", raw$Zipcode)
But if you want to remove n
number of leading zeroes, replace +
with {n}
to remove them.
For instance to remove two 0's use sub("^0{2}", "", raw$Zipcode)
.
Related Topics
Filtering Observations in Dplyr in Combination with Grepl
Conditionally Replacing Column Values with Data.Table
Plotting a Curve Around a Set of Points
Center-Align Legend Title and Legend Keys in Ggplot2 for Long Legend Titles
Ggplot2: Line Connecting the Means of Grouped Data
Why Do Logicals (Booleans) in R Require 4 Bytes
Checking Cran Incoming Feasibility ... Note Maintainer
R: Xtable Caption (Or Comment)
R List Get First Item of Each Element
Can't Open Sockets for Parallel Cluster
Error When Using Predict() on a Randomforest Object Trained with Caret's Train() Using Formula
Pass String to Facet_Grid:Ggplot2
About Gforce in Data.Table 1.9.2
Using Parlapply and Clusterexport Inside a Function
Deleting Rows That Are Duplicated in One Column Based on the Conditions of Another Column