R - If column contains a string from vector, append flag into another column
Update:
If a list is preferred: Using str_extract_all:
df %>%
transmute(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}"))
gives:
new_colonetext new_colcop new_coltext3
<list> <list> <list>
1 <chr [1]> <NULL> <chr [2]>
2 <chr [2]> <chr [2]> <NULL>
3 <chr [2]> <chr [4]> <chr [5]>
Here is how you could achieve the result:
- create a pattern of the vector
- use
mutate
across
to check the needed columns - if the desired string is detected then extract to a new column !
myvec <- c("cat", "dog", "bird")
pattern <- paste(myvec, collapse="|")
library(dplyr)
library(tidyr)
df %>%
mutate(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) %>%
unite(topic, starts_with('new'), na.rm = TRUE, sep = ',')
id onetext cop text3 topic
<dbl> <chr> <chr> <chr> <chr>
1 1 cat furry pink british Little Grey Cat is the nickname given to a kitten of the British Shorthai~ On October 4th the first single topic blog devoted to the little grey cat was lau~ "cat,NULL,c(\"cat\", \"cat\")"
2 2 dog cat fight Dogs have soft fur and tails so do cats Do cats like to chase their tails there are many fights going on and this is just an example text "c(\"dog\", \"cat\"),c(\"cat\", \"cat\"),~
3 3 bird cat issues A cat and bird can coexist in a home but you will have to take certain me~ Some cats will not care about a pet bird at all while others will make it its lif~ "c(\"bird\", \"cat\"),c(\"cat\", \"bird\"~
check if column contains part of another column in r
You can do this in this way:
df <- data.frame(a,b,stringsAsFactors = F)
for (i in seq(1,nrow(df))){
if (df$a[i] == '' || length(agrep(df$a[i],df$b[i])) > 0)
df$c[i] <- 'update'
else
df$c[i] <- 'insert'
}
df
## a b c
##1 0c1234 Oc1234 update
##2 Oc5678 update
##3 2468O Oc9123 insert
Filter for column value if another column contains a value
We can use a group_by
filter
here
library(dplyr)
my_df %>%
group_by(x) %>%
filter(if(any(y == 1)) y == 1 else TRUE)
# A tibble: 4 x 2
# Groups: x [2]
# x y
# <dbl> <dbl>
#1 1 1
#2 2 4
#3 2 5
#4 2 6
Or if it doesn't needs to be group by
my_df %>%
filter( (x == 1 & y == 1)|(x !=1))
Or with subset
subset(my_df, (x == 1 & y == 1)|(x !=1))
# x y
#1 1 1
#4 2 4
#5 2 5
#6 2 6
Or
subset(my_df, (x == 1 & y == 1)|(x !=1 & y != 1))
Check whether values in one data frame column exist in a second data frame
Use %in%
as follows
A$C %in% B$C
Which will tell you which values of column C of A are in B.
What is returned is a logical vector. In the specific case of your example, you get:
A$C %in% B$C
# [1] TRUE FALSE TRUE TRUE
Which you can use as an index to the rows of A
or as an index to A$C
to get the actual values:
# as a row index
A[A$C %in% B$C, ] # note the comma to indicate we are indexing rows
# as an index to A$C
A$C[A$C %in% B$C]
[1] 1 3 4 # returns all values of A$C that are in B$C
We can negate it too:
A$C[!A$C %in% B$C]
[1] 2 # returns all values of A$C that are NOT in B$C
If you want to know if a specific value is in B$C, use the same function:
2 %in% B$C # "is the value 2 in B$C ?"
# FALSE
A$C[2] %in% B$C # "is the 2nd element of A$C in B$C ?"
# FALSE
Find column value contained in another column R
We split
the second and third column by one or more space (\\s+
), then paste
the union
of the corresponding rows with mapply
to create the 'combined'
lst <- lapply(df[2:3], function(x) strsplit(as.character(x), "\\s+"))
df$combined <- mapply(function(x,y) paste(union(x, y), collapse=" "), lst$add1, lst$add2)
df$combined
#[1] "21ST AVE BLAH ST" "5TH ST EAST BLAH BLVD"
Or another option is gsub
gsub("((\\w+\\s*){2,})\\1", "\\1", do.call(paste, df[2:3]))
#[1] "21ST AVE BLAH ST" "5TH ST EAST BLAH BLVD"
Check if column contains value from a list and assign that value to new column
Paste the base_patters
together and use str_extract
to extract any pattern present in mynames
.
library(data.table)
library(stringr)
transformations[,pattern := str_extract(mynames,str_c(base_patters,collapse = "|"))]
# mynames pattern
#1: HI_pat1_jo pat1
#2: A2_a4_pat1_LN pat1
#3: pat3_LN pat3
R function that detects if a dataframe column contains string values from another dataframe column and adds a column that contains the detected str
First you could a column for each of the possible categories to the dataframe with the names, as placeholders (just filled with NA). Then for each of those columns, check whether the column name (so the category) appears in the name. Turn it into a long dataframe, and then remove the FALSE
rows -- those that didn't detect the category in the name.
library(tidyverse)
df1 <- tribble(
~name,
"Apple page",
"Mango page",
"Lychee juice",
"Cranberry club"
)
df2 <- tribble(
~fruit,
"Apple",
"Grapes",
"Strawberry",
"Mango",
"lychee",
"cranberry"
)
fruits <- df2$fruit %>%
str_to_lower() %>%
set_names(rep(NA_character_, length(.)), .)
df1 %>%
add_column(!!!fruits) %>%
mutate(across(-name, ~str_detect(str_to_lower(name), cur_column()))) %>%
pivot_longer(-name, names_to = "category") %>%
filter(value) %>%
select(-value)
#> # A tibble: 4 × 2
#> name category
#> <chr> <chr>
#> 1 Apple page apple
#> 2 Mango page mango
#> 3 Lychee juice lychee
#> 4 Cranberry club cranberry
R modify value in one column if content in another column contains string
Try this function:
subtract_match <- function(column1, column2, text, df) {
df2 <- df
df2[, column2] <- ifelse(grepl(text, df[, column1]),
df[, column2] - nchar(text),
df[, column2])
df2
}
subtract_match("test", "testcount", "two", df1)
test hyp testcount hypcount
1 one two 3 3
2 two one 0 3
3 three onetwo 5 6
4 one one 3 3
5 onetwo two 3 3
subtract_match("hyp", "hypcount", "two", df1)
test hyp testcount hypcount
1 one two 3 0
2 two one 3 3
3 three onetwo 5 3
4 one one 3 3
5 onetwo two 6 0
Related Topics
R Xml - Combining Parent and Child Nodes(W Same Name) into Data Frame
Speeding Up Julia's Poorly Written R Examples
How to Create Textarea as Input in a Shiny Webapp in R
Any Way to Pause at Specific Frames/Time Points with Transition_Reveal in Gganimate
How to Create a Continuous Density Heatmap of 2D Scatter Data in R
Geom_Bar() + Pictograms, How To
Calculating Length of 95%-Ci Using Dplyr
R Graphs: Creating Tufte's Horizontal Bar Lines
How to Get My Blogdown Blog on R-Bloggers
Parse String with Additional Characters in Format to Date
Aggregation Using Ffdfdply Function in R
The Difference Between Domc and Doparallel in R
Differences in Heatmap/Clustering Defaults in R (Heatplot Versus Heatmap.2)
Ggally::Ggpairs Plot Without Gridlines When Plotting Correlation Coefficient
What's the Difference Between Reactive Value and Reactive Expression