R Remove Multiple Text Strings in Data Frame

R remove multiple text strings in data frame

wordstoremove <- c("ai", "computing", "ulitzer", "ibm", "privacy", "cognitive")

(dat <- read.table(header = TRUE, text = 'id text time username
1 "ai and x" 10 "me"
2 "and computing" 5 "you"
3 "nothing" 15 "everyone"
4 "ibm privacy" 0 "know"'))

#   id          text time username
# 1  1      ai and x   10       me
# 2  2 and computing    5      you
# 3  3       nothing   15 everyone
# 4  4   ibm privacy    0     know

(dat1 <- as.data.frame(sapply(dat, function(x) 
  gsub(paste(wordstoremove, collapse = '|'), '', x))))

#   id    text time username
# 1  1   and x   10       me
# 2  2    and     5      you
# 3  3 nothing   15 everyone
# 4  4            0     know

R remove multiple text strings in data.table

Try this:

library(data.table)
foo <- function(x) gsub(paste0(wordstoremove, collapse="s?|"), "", x)
DT[, names(DT)[-1] := lapply(.SD, foo), .SDcols = names(DT)[-1]]
DT
#                    vid     wr1       wr2     wr3
# 1:            Simpsons  Homer      Bart   Marge 
# 2:            Flanders    Ned       Rod    Todd 
# 3: Nahasapeemapetilons    Apu   Manjula  Sanjay 
# 4:           Spucklers Cletus  Brandine       NA
# 5:             Wiggums  Chief     Ralph   Sarah

Remove multiple rows with specific string values

You can use filter_at with selected columns or range of columns

library(dplyr)

dat %>%
  filter_at(vars(animal,Insurance), all_vars(!. %in% c("Item skipped", "")))

#  animal Insurance condition age
#1    dog         Y             6
#2    cat         N    Asthma   6

Or with base R you could use rowSums

cols <- c('animal', 'Insurance')
dat[rowSums(dat[cols] == "Item skipped" | dat[cols] == "") == 0, ]

Replace multiple strings in a column of a data frame

You can do the following to add as many pattern-replacement pairs as you want in one line.

library(stringr)

vec <- c("Absent", "Absent", "Present", "Present", "XX", "YY", "ZZ")

str_replace_all(vec, c("Absent" = "A", "Present" = "P"))
# [1] "A"  "A"  "P"  "P"  "XX" "YY" "ZZ"

Removing some text string and characters from a column in dataframe in R

We can match the .(\\. - escaped as it is a metacharacter that matches any character) and one or more digits (\\d+) till the end ($) of the string and replace with blank ("") and wrap with gsub to match the backquote ("`") and remove it

df$Regression <- gsub("`", "", sub("\\.\\d+$", '', df$Regression))
df$Regression
[1] "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A"

Removing all text before a certain character for all variables in R

We can use trimws with whitespace as a regex to match characters (.*) till the /

names(df1) <- trimws(names(df1), whitespace = ".*/")

Or another option is basename

names(df1) <- basename(names(df1))

-output

> df1
     name color year
1 letters  blue 1995
2 letters   red 1997

data

df1 <- structure(list(`/98519/name` = c("letters", "letters"), 
`aa77nf4/color` = c("blue", 
"red"), `4//342/year` = c(1995L, 1997L)), class = "data.frame", row.names = c(NA, 
-2L))

Remove string from multiple columns only if it is at the start of a string in R

One dplyr possibility could be:

problem %>%
 mutate_at(vars(starts_with("fact")), list(~ sub("^old_", "\\1", .)))

  name   height weight fact1    fact2    fact3    
  <chr>   <dbl>  <dbl> <chr>    <chr>    <chr>    
1 Random     48     95 song_yes bold_yes cold_yes 
2 Silly      50    102 dance_no shy_no   young_yes

Or:

problem %>%
 mutate_at(vars(starts_with("fact")), list(~ substr(., 5, nchar(.))))

R Remove Multiple Text Strings in Data Frame