If Column Contains String Then Enter Value for That Row

If Column Contains String then enter value for that row

We can use grepl to return a logical index by matching the 'D' in the 'A' column, and then with ifelse, change the logical vector to 'yes' and 'no'

df$C <- ifelse(grepl("D", df$A), "yes", "no")

If Column Contains String then put value for that row

Does this work:

library(dplyr)
library(stringr)
df %>% mutate(B = case_when(str_detect(A, 'Sales|Marketing') ~ 'Yes', TRUE ~ 'No'))
A B
1 Manager, Sales Yes
2 Manager No
3 Manager, Marketing Yes
4 Manager, Marketing, Sales Yes

If Many Columns Contains String then enter value for that row

1. If you want a single number in var (1 or 0)

1 = the corresponding string from containers was found at least once in all of the eleven FAC_ variables in the same row

Using the tidyverse:

library(tidyverse)
for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){str_detect(vec, containers[i]) %>% any(na.rm=T) %>% as.numeric}))
}

Output

> bd
# A tibble: 35,279 x 22
FAC_1 FAC_2 FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11
<chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Substra~ Natural~ "Morfol~ NA NA NA NA NA NA NA NA 1 0 0 0 0 1 0 0 0 0 0
2 Substra~ Alterna~ "Natura~ "Materi~ "Morfo~ NA NA NA NA NA NA 1 1 0 0 0 1 1 0 0 0 0
3 NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 0 0 0 0 0 0 0
4 Substra~ Alterna~ "Rocas ~ "Orient~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 1 0 0 1 0 0 0
5 Substra~ Alterna~ "Rocas ~ "Orient~ "Natur~ "Pend~ NA NA NA NA NA 1 1 0 1 1 1 0 1 0 0 0
6 Substra~ Alterna~ "Rocas ~ "Pendie~ "Morfo~ NA NA NA NA NA NA 1 1 0 1 0 0 0 1 0 0 0
7 Substra~ Alterna~ "Rocas ~ "Natura~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 0 1 0 1 0 0 0
8 Substra~ Alterna~ "Materi~ "Pendie~ "Morfo~ NA NA NA NA NA NA 1 1 0 0 0 0 1 1 0 0 0
9 Substra~ Alterna~ "Rocas ~ "Natura~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 0 1 0 1 0 0 0
10 Substra~ Rocas m~ "Orient~ "Natura~ "Mater~ "Pend~ NA NA NA NA NA 1 0 0 1 1 1 1 1 0 0 0
# ... with 35,269 more rows

2. If you want a 1 or 0 for each of the eleven FAC_, and this in every var

I put the 1 and 0 in a list, so var1 is a vector of lists. Each element is a list with 11 numbers.

for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){map(vec, ~grepl(containers[i], .) %>% as.numeric)}))
}

The calculation takes a bit of time but I do find

Output

> bd
# A tibble: 35,279 x 22
FAC_1 FAC_2 FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11
<chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <list> <list> <list> <list> <lis> <lis> <lis> <lis> <lis> <lis> <lis>
1 Substr~ Natura~ "Morfo~ NA NA NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
2 Substr~ Altern~ "Natur~ "Mater~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
3 NA NA NA NA NA NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
4 Substr~ Altern~ "Rocas~ "Orien~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
5 Substr~ Altern~ "Rocas~ "Orien~ "Natur~ "Pend~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
6 Substr~ Altern~ "Rocas~ "Pendi~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
7 Substr~ Altern~ "Rocas~ "Natur~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
8 Substr~ Altern~ "Mater~ "Pendi~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
9 Substr~ Altern~ "Rocas~ "Natur~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
10 Substr~ Rocas ~ "Orien~ "Natur~ "Mater~ "Pend~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
# ... with 35,269 more rows
> bd$var1[1]
[[1]]
[[1]]$FAC_1
[1] 1

[[1]]$FAC_2
[1] 0

[[1]]$FAC_3
[1] 0

[[1]]$FAC_4
[1] 0

[[1]]$FAC_5
[1] 0

[[1]]$FAC_6
[1] 0

[[1]]$FAC_7
[1] 0

[[1]]$FAC_8
[1] 0

[[1]]$FAC_9
[1] 0

[[1]]$FAC_10
[1] 0

[[1]]$FAC_11
[1] 0

Edit: if you want the whole string where we found the word in the cell

If you are certain each word can only appear once in the row, you can do

for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){ifelse(str_detect(vec, containers[i]) %>% any(na.rm=T),
str_subset(vec, containers[i]),
NA)}))
}

with the value of the cells filled if the word was found, NA if the word was not found. If you want to change the value when the word was not found, modify NA in the ifelse function, for example with "Not found".

R - If column contains a string from vector, append flag into another column

Update:
If a list is preferred: Using str_extract_all:

df %>%  
transmute(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}"))

gives:

  new_colonetext new_colcop new_coltext3
<list> <list> <list>
1 <chr [1]> <NULL> <chr [2]>
2 <chr [2]> <chr [2]> <NULL>
3 <chr [2]> <chr [4]> <chr [5]>

Here is how you could achieve the result:

  1. create a pattern of the vector
  2. use mutate across to check the needed columns
  3. if the desired string is detected then extract to a new column !
myvec <- c("cat", "dog", "bird")

pattern <- paste(myvec, collapse="|")

library(dplyr)
library(tidyr)
df %>%
mutate(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) %>%
unite(topic, starts_with('new'), na.rm = TRUE, sep = ',')
    id onetext                cop                                                                        text3                                                                              topic                                     
<dbl> <chr> <chr> <chr> <chr>
1 1 cat furry pink british Little Grey Cat is the nickname given to a kitten of the British Shorthai~ On October 4th the first single topic blog devoted to the little grey cat was lau~ "cat,NULL,c(\"cat\", \"cat\")"
2 2 dog cat fight Dogs have soft fur and tails so do cats Do cats like to chase their tails there are many fights going on and this is just an example text "c(\"dog\", \"cat\"),c(\"cat\", \"cat\"),~
3 3 bird cat issues A cat and bird can coexist in a home but you will have to take certain me~ Some cats will not care about a pet bird at all while others will make it its lif~ "c(\"bird\", \"cat\"),c(\"cat\", \"bird\"~

Conditional If Statement: If value in row contains string ... set another column equal to string

The current solution behaves wrongly if your df contains NaN values. In that case I recommend using the following code which worked for me

temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
pd.np.where(temp.str.contains("email"), "email",
pd.np.where(temp.str.contains("conference"), "conference",
pd.np.where(temp.str.contains("call"), "call", "task"))))

Return value if row contains specific text

You gan use grepl() to find out whether a word appears in your value:

df$New = "no green"
df$New[grepl("Green",df$Combination)] = "it has green"

This first creates the default case (no green) and then transforms all values corresponding to combinations containing "Green".

Does data frame row and column contains string? If so, return that string in new column

You can use a regular expression with str.extract to capture everything from Note to just before the comma.

df['Note_number'] = df.note.str.extract('(Note.*)(?=\,)')

Output

   id partNumber                             note Note_number
0 1 a1b33 apples NaN
1 2 hhgh5667 banana, Note 55, and pineapples Note 55
2 3 hhgh5667 Note 1A, and blueberries Note 1A
3 4 09890ii blackberries NaN

R modify value in one column if content in another column contains string

Try this function:

subtract_match <- function(column1, column2, text, df) {
df2 <- df
df2[, column2] <- ifelse(grepl(text, df[, column1]),
df[, column2] - nchar(text),
df[, column2])
df2
}

subtract_match("test", "testcount", "two", df1)

test hyp testcount hypcount
1 one two 3 3
2 two one 0 3
3 three onetwo 5 6
4 one one 3 3
5 onetwo two 3 3

subtract_match("hyp", "hypcount", "two", df1)

test hyp testcount hypcount
1 one two 3 0
2 two one 3 3
3 three onetwo 5 3
4 one one 3 3
5 onetwo two 6 0


Related Topics



Leave a reply



Submit