R Remove Multiple Text Strings in Data Frame

R remove multiple text strings in data frame

wordstoremove <- c("ai", "computing", "ulitzer", "ibm", "privacy", "cognitive")

(dat <- read.table(header = TRUE, text = 'id text time username
1 "ai and x" 10 "me"
2 "and computing" 5 "you"
3 "nothing" 15 "everyone"
4 "ibm privacy" 0 "know"'))

# id text time username
# 1 1 ai and x 10 me
# 2 2 and computing 5 you
# 3 3 nothing 15 everyone
# 4 4 ibm privacy 0 know

(dat1 <- as.data.frame(sapply(dat, function(x)
gsub(paste(wordstoremove, collapse = '|'), '', x))))

# id text time username
# 1 1 and x 10 me
# 2 2 and 5 you
# 3 3 nothing 15 everyone
# 4 4 0 know

R remove multiple text strings in data.table

Try this:

library(data.table)
foo <- function(x) gsub(paste0(wordstoremove, collapse="s?|"), "", x)
DT[, names(DT)[-1] := lapply(.SD, foo), .SDcols = names(DT)[-1]]
DT
# vid wr1 wr2 wr3
# 1: Simpsons Homer Bart Marge
# 2: Flanders Ned Rod Todd
# 3: Nahasapeemapetilons Apu Manjula Sanjay
# 4: Spucklers Cletus Brandine NA
# 5: Wiggums Chief Ralph Sarah

Remove multiple rows with specific string values

You can use filter_at with selected columns or range of columns

library(dplyr)

dat %>%
filter_at(vars(animal,Insurance), all_vars(!. %in% c("Item skipped", "")))

# animal Insurance condition age
#1 dog Y 6
#2 cat N Asthma 6

Or with base R you could use rowSums

cols <- c('animal', 'Insurance')
dat[rowSums(dat[cols] == "Item skipped" | dat[cols] == "") == 0, ]

Replace multiple strings in a column of a data frame

You can do the following to add as many pattern-replacement pairs as you want in one line.

library(stringr)

vec <- c("Absent", "Absent", "Present", "Present", "XX", "YY", "ZZ")

str_replace_all(vec, c("Absent" = "A", "Present" = "P"))
# [1] "A" "A" "P" "P" "XX" "YY" "ZZ"

Removing some text string and characters from a column in dataframe in R

We can match the .(\\. - escaped as it is a metacharacter that matches any character) and one or more digits (\\d+) till the end ($) of the string and replace with blank ("") and wrap with gsub to match the backquote ("`") and remove it

df$Regression <- gsub("`", "", sub("\\.\\d+$", '', df$Regression))
df$Regression
[1] "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A" "TLC~7_A"

Removing all text before a certain character for all variables in R

We can use trimws with whitespace as a regex to match characters (.*) till the /

names(df1) <- trimws(names(df1), whitespace = ".*/")

Or another option is basename

names(df1) <- basename(names(df1))

-output

> df1
name color year
1 letters blue 1995
2 letters red 1997

data

df1 <- structure(list(`/98519/name` = c("letters", "letters"), 
`aa77nf4/color` = c("blue",
"red"), `4//342/year` = c(1995L, 1997L)), class = "data.frame", row.names = c(NA,
-2L))

Remove string from multiple columns only if it is at the start of a string in R

One dplyr possibility could be:

problem %>%
mutate_at(vars(starts_with("fact")), list(~ sub("^old_", "\\1", .)))

name height weight fact1 fact2 fact3
<chr> <dbl> <dbl> <chr> <chr> <chr>
1 Random 48 95 song_yes bold_yes cold_yes
2 Silly 50 102 dance_no shy_no young_yes

Or:

problem %>%
mutate_at(vars(starts_with("fact")), list(~ substr(., 5, nchar(.))))


Related Topics



Leave a reply



Submit