Easier Way to Use Grepl and Ifelse Across Multiple Columns

easier way to use grepl and ifelse across multiple columns

For complex operations it can often be useful to first make the operation into a function and then apply it to each case. For instance,

get_sector <- function(x, sector) {
apply(x, 1, function(y) {
as.numeric(any(grepl(sector, y), na.rm = TRUE))
})
}

jobdata$private <- get_sector(jobdata, "Private")
jobdata$public <- get_sector(jobdata, "Public")
jobdata$other <- get_sector(jobdata, "Other")

How can I apply a combination of if_else and grepl function to selected columns in R?

We can try using lapply syntax targeting only the 5th through 197th columns. Note that I define a helper function below, and I avoid using ifelse, since the boolean result can simply be cast to 1 or 0 to get the behavior you want.

func <- function(x) {
as.numeric(grepl("fashion|cloth|apparel|textile|material|garment|wardrobe|shoes|sneakers|footwear|sportswear|streetwear|menswear|athleisure|hautecouture|hypebeast", x) &
!grepl("rev|clean|vegan|warrior|sdg|capsule|worker|whomademyclothes|conscious|circular|slow|responsible|smart|secondhand|sust|eco|organic|green|ethical|fair|environment|repurposed|upcycl|recycl|reus", x))
}
cols <- names(fashion_lists)[5:197]
fashion_lists[cols] <- lapply(fashion_lists[cols], func)

Ifelse for Multiple Columns in DataFrame

We may use if_any

library(dplyr)
df1 <- df1 %>%
mutate(calculated_column = +(if_any(-ID, ~ . %in% 'high')))

-output

df1
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1

Or if we want to use base R, create the logical condition with rowSums on a logical matrix

df1$calculated_column <-  +(rowSums(df1[-1] == "high", na.rm = TRUE) > 0)

data

df1 <- structure(list(ID = 1:4, Winter = c("high", "low", "low", "low"
), Spring = c(NA, "high", NA, "high"), Summer = c("high", NA,
NA, NA), Fall = c("low", "low", "low", "low")),
class = "data.frame", row.names = c(NA,
-4L))

Using grepl() within an if else statement within a for loop in R

Since you're not providing reproducible data to illustrate the procedure, here's some mock data. If I understand you correctly you want to mutate codes based on patterns. If that's the case you can use nested ifelse statements:

Data:

df <- data.frame(
id = c("a87", "b87", "abc95", "a95", "x123")
)

Now you define the new column with the mutated values:

df$new <- ifelse(grepl("87", df$id), "new1",
ifelse(grepl("95", df$id), "new2", "new3"))

An ifelseclause takes three arguments as input: (i) the condition, (ii) what to do if the condition holds true, (iii) what to do if it doesn't. The execution of (iii) can be delayed by inserting yet another ifelse clause testing for a second condition. This in turn can be delayed yet again by a third condition and so on.

The result:

df
id new
1 a87 new1
2 b87 new1
3 abc95 new2
4 a95 new2
5 x123 new3

These are not your data but I guess you get the gist: there's no need for a for loop and you can nest one ifelse in another.

conditional search across multiple columns in very large dataframe, goal to create 1/0 column for other analysis using R

I think what you are looking for is the across function, e.g. to look through the first 3 columns of a data.frame using certain searchPattern you can do:

library(dplyr)

data %>% mutate(Opioid_Specific= sum(across(1:3, ~as.numeric(grepl(searchPattern, .))))) %>%
mutate(Opioid_Specific= ifelse(Opioid_Specific>= 1,1,0 ))

Another option would be to use the output of a normal (combined) condition as numeric, e.g.:

data %>% mutate(Opioid_Specific= as.numeric(grepl(searchPattern, R1) | grepl(searchPattern, R2) | grepl(searchPattern, R3)))

Use grepl and ifelse to add a new column

Why not like this:

df$contain_admarkt <- as.integer(grepl('admarkt',df$website_adress))

Reducing nested if else statements with grepl in R

In case you want to extract the number:

df$food_final <- gsub("\\D", "", df$food)

df
# id food food_final
#1 1 X1_ 1
#2 2 X2_ 2
#3 3 X3_ 3
#4 4 X4_ 4
#5 5 X5_ 5
#6 6 X100_ 100

or in case there are different linkages, doing basically the same what you are doing with the nested ifelse.

x <- c("1"="X1_", "2"="X2_", "3"="X3_", "4"="X4_", "5"="X5_", "100"="X100_")
apply(sapply(x, grepl, df$food, ignore.case=TRUE), 1, function(y) names(x)[y][1])
#[1] "1" "2" "3" "4" "5" "100"

Or using Reduce:

x <- c("1"="X1_", "2"="X2_", "3"="X3_", "4"="X4_", "5"="X5_", "100"="X100_")
Reduce(function(a,b) {
i <- is.na(a)
a[i][grepl(x[b], df$food[i], ignore.case=TRUE)] <- b
a
}, names(x), rep(NA, nrow(df)))
#[1] "1" "2" "3" "4" "5" "100"


Related Topics



Leave a reply



Submit