R remove multiple text strings in data frame
wordstoremove <- c("ai", "computing", "ulitzer", "ibm", "privacy", "cognitive")
(dat <- read.table(header = TRUE, text = 'id text time username
1 "ai and x" 10 "me"
2 "and computing" 5 "you"
3 "nothing" 15 "everyone"
4 "ibm privacy" 0 "know"'))
# id text time username
# 1 1 ai and x 10 me
# 2 2 and computing 5 you
# 3 3 nothing 15 everyone
# 4 4 ibm privacy 0 know
(dat1 <- as.data.frame(sapply(dat, function(x)
gsub(paste(wordstoremove, collapse = '|'), '', x))))
# id text time username
# 1 1 and x 10 me
# 2 2 and 5 you
# 3 3 nothing 15 everyone
# 4 4 0 know
R remove multiple text strings in data.table
Try this:
library(data.table)
foo <- function(x) gsub(paste0(wordstoremove, collapse="s?|"), "", x)
DT[, names(DT)[-1] := lapply(.SD, foo), .SDcols = names(DT)[-1]]
DT
# vid wr1 wr2 wr3
# 1: Simpsons Homer Bart Marge
# 2: Flanders Ned Rod Todd
# 3: Nahasapeemapetilons Apu Manjula Sanjay
# 4: Spucklers Cletus Brandine NA
# 5: Wiggums Chief Ralph Sarah
Remove multiple rows with specific string values
You can use filter_at
with selected columns or range of columns
library(dplyr)
dat %>%
filter_at(vars(animal,Insurance), all_vars(!. %in% c("Item skipped", "")))
# animal Insurance condition age
#1 dog Y 6
#2 cat N Asthma 6
Or with base R you could use rowSums
cols <- c('animal', 'Insurance')
dat[rowSums(dat[cols] == "Item skipped" | dat[cols] == "") == 0, ]
Replace multiple strings in a column of a data frame
You can do the following to add as many pattern-replacement pairs as you want in one line.
library(stringr)
vec <- c("Absent", "Absent", "Present", "Present", "XX", "YY", "ZZ")
str_replace_all(vec, c("Absent" = "A", "Present" = "P"))
# [1] "A" "A" "P" "P" "XX" "YY" "ZZ"
Removing some text string and characters from a column in dataframe in R
We can match the .
(\\.
- escaped as it is a metacharacter that matches any character) and one or more digits (\\d+
) till the end ($
) of the string and replace with blank (""
) and wrap with gsub
to match the backquote ("`") and remove it
df$Regression <- gsub("`", "", sub("\\.\\d+$", '', df$Regression))
df$Regression
[1] "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A"
Removing all text before a certain character for all variables in R
We can use trimws
with whitespace
as a regex
to match characters (.*
) till the /
names(df1) <- trimws(names(df1), whitespace = ".*/")
Or another option is basename
names(df1) <- basename(names(df1))
-output
> df1
name color year
1 letters blue 1995
2 letters red 1997
data
df1 <- structure(list(`/98519/name` = c("letters", "letters"),
`aa77nf4/color` = c("blue",
"red"), `4//342/year` = c(1995L, 1997L)), class = "data.frame", row.names = c(NA,
-2L))
Remove string from multiple columns only if it is at the start of a string in R
One dplyr
possibility could be:
problem %>%
mutate_at(vars(starts_with("fact")), list(~ sub("^old_", "\\1", .)))
name height weight fact1 fact2 fact3
<chr> <dbl> <dbl> <chr> <chr> <chr>
1 Random 48 95 song_yes bold_yes cold_yes
2 Silly 50 102 dance_no shy_no young_yes
Or:
problem %>%
mutate_at(vars(starts_with("fact")), list(~ substr(., 5, nchar(.))))
Related Topics
Fill in Data Frame with Values from Rows Above
Why Does Lm Run Out of Memory While Matrix Multiplication Works Fine for Coefficients
Installing Rcppeigen on Amazon Ec2
Stop Ggplot2 from Dropping Data Points Outside of Axis Limits
"Could Not Find Function" in Roxygen Examples During Cmd Check
Unscale and Uncenter Glmer Parameters
How to List All the Functions Signatures in an R File
Ordered Factors in Ggplot2 Bar Chart
Make List of Objects in Global Environment Matching Certain String Pattern
How to Replace Multiple Values at Once
R: Ggplot2 Make Two Geom_Tile Plots Have Equal Height
Compute All Pairwise Differences Within a Vector in R
Remove Rows Which Have All Nas in Certain Columns
How to Colour the Labels of a Dendrogram by an Additional Factor Variable in R