If Column Contains String then enter value for that row
We can use grepl
to return a logical index by matching the 'D' in the 'A' column, and then with ifelse
, change the logical vector to 'yes' and 'no'
df$C <- ifelse(grepl("D", df$A), "yes", "no")
If Column Contains String then put value for that row
Does this work:
library(dplyr)
library(stringr)
df %>% mutate(B = case_when(str_detect(A, 'Sales|Marketing') ~ 'Yes', TRUE ~ 'No'))
A B
1 Manager, Sales Yes
2 Manager No
3 Manager, Marketing Yes
4 Manager, Marketing, Sales Yes
If Many Columns Contains String then enter value for that row
1. If you want a single number in var (1 or 0)
1 = the corresponding string from containers was found at least once in all of the eleven FAC_ variables in the same row
Using the tidyverse
:
library(tidyverse)
for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){str_detect(vec, containers[i]) %>% any(na.rm=T) %>% as.numeric}))
}
Output
> bd
# A tibble: 35,279 x 22
FAC_1 FAC_2 FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11
<chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Substra~ Natural~ "Morfol~ NA NA NA NA NA NA NA NA 1 0 0 0 0 1 0 0 0 0 0
2 Substra~ Alterna~ "Natura~ "Materi~ "Morfo~ NA NA NA NA NA NA 1 1 0 0 0 1 1 0 0 0 0
3 NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 0 0 0 0 0 0 0
4 Substra~ Alterna~ "Rocas ~ "Orient~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 1 0 0 1 0 0 0
5 Substra~ Alterna~ "Rocas ~ "Orient~ "Natur~ "Pend~ NA NA NA NA NA 1 1 0 1 1 1 0 1 0 0 0
6 Substra~ Alterna~ "Rocas ~ "Pendie~ "Morfo~ NA NA NA NA NA NA 1 1 0 1 0 0 0 1 0 0 0
7 Substra~ Alterna~ "Rocas ~ "Natura~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 0 1 0 1 0 0 0
8 Substra~ Alterna~ "Materi~ "Pendie~ "Morfo~ NA NA NA NA NA NA 1 1 0 0 0 0 1 1 0 0 0
9 Substra~ Alterna~ "Rocas ~ "Natura~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 0 1 0 1 0 0 0
10 Substra~ Rocas m~ "Orient~ "Natura~ "Mater~ "Pend~ NA NA NA NA NA 1 0 0 1 1 1 1 1 0 0 0
# ... with 35,269 more rows
2. If you want a 1 or 0 for each of the eleven FAC_, and this in every var
I put the 1 and 0 in a list, so var1 is a vector of lists. Each element is a list with 11 numbers.
for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){map(vec, ~grepl(containers[i], .) %>% as.numeric)}))
}
The calculation takes a bit of time but I do find
Output
> bd
# A tibble: 35,279 x 22
FAC_1 FAC_2 FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11
<chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <list> <list> <list> <list> <lis> <lis> <lis> <lis> <lis> <lis> <lis>
1 Substr~ Natura~ "Morfo~ NA NA NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
2 Substr~ Altern~ "Natur~ "Mater~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
3 NA NA NA NA NA NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
4 Substr~ Altern~ "Rocas~ "Orien~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
5 Substr~ Altern~ "Rocas~ "Orien~ "Natur~ "Pend~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
6 Substr~ Altern~ "Rocas~ "Pendi~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
7 Substr~ Altern~ "Rocas~ "Natur~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
8 Substr~ Altern~ "Mater~ "Pendi~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
9 Substr~ Altern~ "Rocas~ "Natur~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
10 Substr~ Rocas ~ "Orien~ "Natur~ "Mater~ "Pend~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
# ... with 35,269 more rows
> bd$var1[1]
[[1]]
[[1]]$FAC_1
[1] 1
[[1]]$FAC_2
[1] 0
[[1]]$FAC_3
[1] 0
[[1]]$FAC_4
[1] 0
[[1]]$FAC_5
[1] 0
[[1]]$FAC_6
[1] 0
[[1]]$FAC_7
[1] 0
[[1]]$FAC_8
[1] 0
[[1]]$FAC_9
[1] 0
[[1]]$FAC_10
[1] 0
[[1]]$FAC_11
[1] 0
Edit: if you want the whole string where we found the word in the cell
If you are certain each word can only appear once in the row, you can do
for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){ifelse(str_detect(vec, containers[i]) %>% any(na.rm=T),
str_subset(vec, containers[i]),
NA)}))
}
with the value of the cells filled if the word was found, NA
if the word was not found. If you want to change the value when the word was not found, modify NA
in the ifelse function, for example with "Not found"
.
R - If column contains a string from vector, append flag into another column
Update:
If a list is preferred: Using str_extract_all:
df %>%
transmute(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}"))
gives:
new_colonetext new_colcop new_coltext3
<list> <list> <list>
1 <chr [1]> <NULL> <chr [2]>
2 <chr [2]> <chr [2]> <NULL>
3 <chr [2]> <chr [4]> <chr [5]>
Here is how you could achieve the result:
- create a pattern of the vector
- use
mutate
across
to check the needed columns - if the desired string is detected then extract to a new column !
myvec <- c("cat", "dog", "bird")
pattern <- paste(myvec, collapse="|")
library(dplyr)
library(tidyr)
df %>%
mutate(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) %>%
unite(topic, starts_with('new'), na.rm = TRUE, sep = ',')
id onetext cop text3 topic
<dbl> <chr> <chr> <chr> <chr>
1 1 cat furry pink british Little Grey Cat is the nickname given to a kitten of the British Shorthai~ On October 4th the first single topic blog devoted to the little grey cat was lau~ "cat,NULL,c(\"cat\", \"cat\")"
2 2 dog cat fight Dogs have soft fur and tails so do cats Do cats like to chase their tails there are many fights going on and this is just an example text "c(\"dog\", \"cat\"),c(\"cat\", \"cat\"),~
3 3 bird cat issues A cat and bird can coexist in a home but you will have to take certain me~ Some cats will not care about a pet bird at all while others will make it its lif~ "c(\"bird\", \"cat\"),c(\"cat\", \"bird\"~
Conditional If Statement: If value in row contains string ... set another column equal to string
The current solution behaves wrongly if your df contains NaN values. In that case I recommend using the following code which worked for me
temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
pd.np.where(temp.str.contains("email"), "email",
pd.np.where(temp.str.contains("conference"), "conference",
pd.np.where(temp.str.contains("call"), "call", "task"))))
Return value if row contains specific text
You gan use grepl()
to find out whether a word appears in your value:
df$New = "no green"
df$New[grepl("Green",df$Combination)] = "it has green"
This first creates the default case (no green) and then transforms all values corresponding to combinations containing "Green".
Does data frame row and column contains string? If so, return that string in new column
You can use a regular expression with str.extract
to capture everything from Note to just before the comma.
df['Note_number'] = df.note.str.extract('(Note.*)(?=\,)')
Output
id partNumber note Note_number
0 1 a1b33 apples NaN
1 2 hhgh5667 banana, Note 55, and pineapples Note 55
2 3 hhgh5667 Note 1A, and blueberries Note 1A
3 4 09890ii blackberries NaN
R modify value in one column if content in another column contains string
Try this function:
subtract_match <- function(column1, column2, text, df) {
df2 <- df
df2[, column2] <- ifelse(grepl(text, df[, column1]),
df[, column2] - nchar(text),
df[, column2])
df2
}
subtract_match("test", "testcount", "two", df1)
test hyp testcount hypcount
1 one two 3 3
2 two one 0 3
3 three onetwo 5 6
4 one one 3 3
5 onetwo two 3 3
subtract_match("hyp", "hypcount", "two", df1)
test hyp testcount hypcount
1 one two 3 0
2 two one 3 3
3 three onetwo 5 3
4 one one 3 3
5 onetwo two 6 0
Related Topics
Remove Weekend Data in a Dataframe
How to Import Only One Function from Another Package, Without Loading the Entire Namespace
Update Rows of Data Frame in R
R: Building a Simple Command Line Plotting Tool/Capturing Window Close Events
Installing R on Osx Big Sur (Edit: and Apple M1) for Use with Rcpp and Openmp
R Obtaining Rownames Date Using Quantmod
Predicting Probabilities for Gbm with Caret Library
How to Load a Matlab Struct into a R Data Frame
Possible Issue About Random Number Generator
How to Reverse Legend (Labels and Color) So High Value Starts at Bottom
Rbindlist Two Data.Tables Where One Has Factor and Other Has Character Type for a Column
List and Description of All Packages in Cran from Within R
Display Frequency Instead of Count with Geom_Bar() in Ggplot
Filling Bars in Barplot with Textiles in Ggplot2
How to Build a Crossword-Like Plot for a Boolean Matrix
Drawing a Stratified Sample in R
Using Shorthand Character Classes Inside Character Classes in R Regex