How to loop through columns, check if a particular value exists in any of the columns, mutate a new column and enter 1 if it exists, 0 if not?
1) Assuming DF
is as shown reproducibly in the Note at the end use sapply
to create an matrix of indicators and then cbind
it to the ID
column. Finally make the names nicer. No packages are used.
ques <- function(i) paste0("Q", 1:25) %in% unlist(DF[i, -1])
DFout <- cbind(DF[1], +t(sapply(1:nrow(DF), ques)))
names(DFout)[-1] <- paste0("Q", names(DFout[-1]))
The first 5 columns are:
> DFout[1:5]
ID Q1 Q2 Q3 Q4
1 100 1 0 0 0
2 101 0 0 1 1
3 102 0 1 1 1
4 103 0 0 0 0
5 104 0 0 0 1
6 105 1 1 1 1
2) Another possibility is to convert the input to long form and then use xtabs
to create a table from it.
library(dplyr)
library(tidyr)
tab <- DF %>%
gather(key, Question, -ID) %>%
filter(nzchar(Question)) %>%
mutate(Question = factor(Question, paste0("Q", 1:25))) %>%
xtabs(~ ID + Question, .)
giving this table. We show the first 5 columns:
> tab[, 1:5]
Question
ID Q1 Q2 Q3 Q4
100 1 0 0 0
101 0 0 1 1
102 0 1 1 1
104 0 0 0 1
105 1 1 1 1
If it is important that the result be a data frame then add:
library(tibble)
tab %>%
as.data.frame.matrix %>%
rownames_to_column(var = "ID")
Note
sample <- "rows ID Col1 Col2 Col3 Col4
1 100 Q1
2 101 Q3 Q4
3 102 Q2 Q3 Q4
4 103
5 104 Q4
6 105 Q1 Q2 Q3 Q4"
DF <- read.table(text = sample, header = TRUE, fill = TRUE, as.is = TRUE,
strip.white = TRUE)[-1]
How to iterate through string of one column to mutate/modify other columns value?
Adapting this answer to your example. You can use separate_rows
then pivot_wider
.
library(tidyverse)
example_df %>%
select(x0, x1) %>%
separate_rows(x1) %>%
pivot_wider(names_from = x1,
values_from = x1,
values_fn = list(x1 = is.character),
values_fill = list(x1 = FALSE))
#----------
# A tibble: 5 x 4
x0 Dog Cat Bird
<dbl> <lgl> <lgl> <lgl>
1 1 TRUE TRUE FALSE
2 2 FALSE TRUE FALSE
3 3 TRUE FALSE TRUE
4 4 TRUE TRUE TRUE
5 5 FALSE TRUE TRUE
How do you evaluate a set of conditions and create a new column based on a list of columns?
We can just use rowSums
to make this efficient
i1 <- startsWith(names(df), 'col')
c( "Failed", "Passed")[(rowSums(df[i1] == 1) == 3) + 1]
#[1] "Failed" "Passed" "Failed"
Or another base R
efficient option is Reduce
c("Failed", "Passed")[Reduce(`&`, df[i1]) +1]
#[1] "Failed" "Passed" "Failed"
NOTE: Both base R
solutions are compact and are very efficient
Or with &
library(dplyr)
df %>%
mutate(evaluation = c('Failed', 'Passed')[1 + (col1 & col2 & col3)])
# ignore1 ignore2 col1 col2 col3 evaluation
#1 1 1 0 0 0 Failed
#2 0 0 1 1 1 Passed
#3 0 1 0 1 0 Failed
Or we can have rowSums
within dplyr
df %>%
mutate(evaluation = c("Failed", "Passed")[(rowSums(.[i1] == 1) == 3) + 1])
NOTE: Both the solutions are very efficient and doesn't use any packages that are not really needed
Or if we need some packages, then use magrittr
with purrr
library(magrittr)
library(purrr)
df %>%
mutate(evaluation = select(., starts_with('col')) %>%
reduce(`&`) %>%
add(1) %>%
c("Failed", "Passed")[.])
# ignore1 ignore2 col1 col2 col3 evaluation
#1 1 1 0 0 0 Failed
#2 0 0 1 1 1 Passed
#3 0 1 0 1 0 Failed
NOTE: Here also, we are not looping over rows, so it should be efficient
create a column where each value is the result of an if statement in R
replace semicolon ";" with "}".
and for else open a new "{"
here is your code it works well:
mtcars %>%
rowwise() %>%
mutate(strand =
if (drat < wt ){
print("-")} else{
print("+")
})
Perform mutate function only if variable exists
Since functions are unique to each variable and you want to return remaining values if one of the columns fail can't really come up with better solution than to use tryCatch
on individual columns.
library(dplyr)
convert_columns <- function(df) {
df %>%
mutate(
a = tryCatch(convert_a(a),error = function(z) return(NA)),
b = tryCatch(convert_b(b),error = function(z) return(NA)),
c = tryCatch(convert_c(c),error = function(z) return(NA)),
#...
#...
)
}
This can be tested using the following mtcars
example :
This works -
mtcars %>%
mutate(a = n_distinct(cyl),
b = mean(mpg),
c = sd(am))
Now if we remove one of the column, the above fails :
mtcars %>%
select(-am) %>%
mutate(a = n_distinct(cyl),
b = mean(mpg),
c = sd(am))
Error: Problem with
mutate()
inputc
.
x cannot coerce type 'closure' to vector of type 'double'
ℹ Inputc
issd(am)
.
Now using tryCatch
mtcars %>%
select(-am) %>%
mutate(a = tryCatch(n_distinct(cyl), error = function(e) return(NA)),
b = tryCatch(mean(mpg), error = function(e) return(NA)),
c = tryCatch(sd(am), error = function(e) return(NA)))
# mpg cyl disp hp drat wt qsec vs gear carb a b c
#1 21 6 160 110 3.9 2.6 16 0 4 4 3 20 NA
#2 21 6 160 110 3.9 2.9 17 0 4 4 3 20 NA
#3 23 4 108 93 3.9 2.3 19 1 4 1 3 20 NA
#4 21 6 258 110 3.1 3.2 19 1 3 1 3 20 NA
#....
Dynamically determine if a dataframe column exists and mutate if it does
You can use mutate_at
with one_of
, both from dplyr
. This will mutate column only if it matches with one of c("first_name", "last_name")
. If no match, it will generate a simple warning but you can ignore or suppress it.
library(dplyr)
d %>%
mutate_at(vars(one_of(c("first_name", "last_name")), toupper)
id col_name last_name
1 19 7461 V
2 52 9651 H
3 56 1901 P
4 13 7866 Z
5 25 9527 U
# example with no match
b %>%
mutate_at(vars(one_of(c("first_name", "last_name"))), toupper)
id col_name another_col
1 34 9315 8686
2 26 5598 4124
3 17 3318 2182
4 32 1418 4369
5 49 4759 6680
Warning message:
Unknown variables: `first_name`, `last_name`
Here are a bunch of other ?select_helpers
in dplyr
-
These functions allow you to select variables based on their names.
starts_with(): starts with a prefix
ends_with(): ends with a prefix
contains(): contains a literal string
matches(): matches a regular expression
num_range(): a numerical range like x01, x02, x03.
one_of(): variables in character vector.
everything(): all variables.
Related Topics
To Find Most Frequently Occuring Element in Matrix in R
Add Column Values Based on Other Columns in Data Frame Using for and If
How to Filter Multiple Columns With Same Condition in R
How to Change Y Axis Limits in Decimal Points in R
Filter Data.Frame Rows by a Logical Condition
Why Does Summarize or Mutate Not Work With Group_By When I Load 'Plyr' After 'Dplyr'
Dictionary Style Replace Multiple Items
Access Lapply Index Names Inside Fun
Storing Ggplot Objects in a List from Within Loop in R
How to Convert Only Some Positive Numbers to Negative Numbers (Conditional Recoding)
Convert Multiple Columns of Numeric Data to Dates in R
How to Give Subtitles for Subplot in Plot_Ly Using R
Change Rows into Columns in R With Values Yes/No (1/0)
Count Number of Rows Within Each Group
What Are the Differences Between "=" and "≪-" Assignment Operators