R match function with conditions
Well, it is easier to help if you share a small but reproducible example of your data. I created a sample dataset to demonstrate the solution.
Here's the data first.
Df1 <- data.frame(ID = 1:5,
Status = c('Injured', 'Dead', 'Dead', 'Alive', 'Injured'))
Df2 <- data.frame(Bird.ID = c(1, 3, 5))
Df1
# ID Status
#1 1 Injured
#2 2 Dead
#3 3 Dead
#4 4 Alive
#5 5 Injured
Df2
# Bird.ID
#1 1
#2 3
#3 5
Solution -
Df1$Status[Df1$ID %in% Df2$Bird.ID & Df1$Status != "Dead"] <- 'Alive'
Df1
# ID Status
#1 1 Alive
#2 2 Dead
#3 3 Dead
#4 4 Alive
#5 5 Alive
Matching function in R: match.fun vs deparse(substitute()) vs supplying function directly
First, documentation! Here are relevant sections from ?match.fun
:
When called inside functions that take a function as argument, extract the desired function object while avoiding undesired matching to objects of other types.
If
FUN
is a function, it is returned. If it is a symbol (for example, enclosed in backquotes) or a character vector of length one, it will be looked up usingget
in the environment of the parent of the caller.
Thus, match.fun
has two main benefits:
- It gives users the option of passing strings and symbols instead of functions.
- It provides type safety, as the return value is always a function. This makes your source code not only more robust, but also more transparent: it is not necessary to read the documentation of your
fun2
to know that its argumentfun
must specify a function.
And it provides these benefits at virtually no cost to performance:
x1 <- mean
x2 <- "mean"
x3 <- quote(mean)
microbenchmark::microbenchmark(match.fun(x1), match.fun(x2), match.fun(x3), times = 1000L)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# match.fun(x1) 287 328 362.481 328 328 1681 1000
# match.fun(x2) 1599 1681 1820.892 1681 1763 7544 1000
# match.fun(x3) 1599 1640 1783.049 1681 1722 7339 1000
For these reasons, it is almost always better to validate with match.fun
before trying to evaluate a function call (as in your fun2
) than to wait and hope that a call can be evaluated (as in your fun1
and fun3
). This principle holds even if your function is not exported and even if you never pass strings or symbols, because transparency (see 2) makes your source code easier to read and maintain.
Your fun3
is unique in that it allows users to pass unevaluated expressions, but that approach is problematic for multiple reasons:
- It will not work as expected inside of other functions; see @Hong Ooi's comment/answer.
- You cannot pass functions accessed with a double or triple colon operator, or anonymous functions, or, more generally, any expression evaluating indirectly to a function:
fun3(base::mean, 1:10)
# Error in `base::mean`(1:10) : could not find function "base::mean"
fun3(function(x) mean(x), 1:10)
# Error in `function(x) mean(x)`(1:10) :
# could not find function "function(x) mean(x)"
fun3(match.fun(mean), 1:10)
# Error in `match.fun(mean)`(1:10) :
# could not find function "match.fun(mean)" - Even if it does work as you expect, it is mostly smoke and mirrors: if the result of
deparse(substitute(fun))
is a string naming a function accessible from the calling environment, then there was no need fordeparse(substitute(fun))
in the first place, becausefun
would have evaluated to that function anyway. It does extra work for nothing:microbenchmark::microbenchmark(fun1(mean, 1:10), fun3(mean, 1:10), times = 1000L)
# Unit: microseconds
# expr min lq mean median uq max neval
# fun1(mean, 1:10) 2.009 2.378 2.700055 2.460 2.788 14.350 1000
# fun3(mean, 1:10) 9.020 10.127 10.991813 10.701 11.480 52.398 1000
In summary, it is good practice to use match.fun
whenever you expect functions as arguments. You might avoid match.fun
if you want to accept functions but not strings or symbols, but in that situation it would still be good practice to have a test:
function(FUN, ...) {
if (!is.function(FUN)) {
stop("oops")
}
## do stuff
}
How to use a match function with a mutate function?
An alternative way to do it could be like this:
library(tidyverse)
df1 %>%
mutate("Flag" = case_when(
ID %in% Status1$ID ~ "Status1",
ID %in% Status2$ID ~ "Status2",
TRUE ~ Status
))
#> ID Status Flag
#> 1 1 N N
#> 2 2 Y Status1
#> 3 3 Y Status1
#> 4 4 N N
#> 5 5 Y Status2
Created on 2022-01-07 by the reprex package (v2.0.1)
Data:
df1 <- data.frame(
ID = c(1, 2, 3, 4, 5),
Status = c("N", "Y", "Y", "N", "Y")
)
Status1 <- data.frame(ID = c(2, 3))
Status2 <- data.frame(ID = c(5))
MATCH function in r
First you have typos in your example. Secondly, the assignment of 'list1$test1value' should have an '[i]' added to it to not save over each round. There should also not be an '[i]' added to list2$id
since you want to search the entire vector for the lookup.
for (i in 1:length(list1)) {
list1$test1value[i] <- list2$test[match(list1$id[i], list2$id,
nomatch = NA_integer_, incomparables = NULL)] }
The code works, but there is no reason for any loops here. You are showing a lack of understanding in how R operates. The below code does the exact same thing much faster.
list1$test1value <- list2$test[match(list1$id, list2$id)]
R is built so that you do not have to hold its hand and instruct it how to go through each element of the vector. match
will automatically iterate through each member one by one and look it up in the other vector for you. It will also assign the result in an orderly way in the dataset.
I will close this as a duplicate because as others suggested, merge
is perfect for this.
merge(list1, list2[c("id", "test")], all.x=TRUE)
# id age name test
#1 1 40 danny 100
#2 2 16 nora NA
#3 3 35 james NA
#4 4 21 ben 55
Get indices of matches with a column in a second data.table
Using .EACHI
and adding the resulting list column by reference.
dt2[ , res := dt1[ , i := .I][.SD, on = .(firstName), .(.(i)), by = .EACHI]$V1]
# lid firstName res
# 1: 1 Maria NA
# 2: 2 Jim 1,4
# 3: 3 Jack NA
# 4: 4 Anne 3,5
cannot match using the match function and provides NA error
When you match
676 with Values
it returns NA
and you cannot subset that as an index from as1$Values
. If you want to use match
try :
inds <- !is.na(match(as2$Values_a, as1$Values))
as1$Values[inds] <- as2$Values_b[inds]
You could also join the data :
library(dplyr)
left_join(as1, as2, by = c('Values' = 'Values_a')) %>%
mutate(Values = coalesce(Values_b, Values)) %>%
select(names(as1))
# ID pID Values
#1 1 21 544
#2 2 22 33
#3 3 23 45
#4 6 26 12
How to replicate Excel's index matching formula in R using dplyr?
Base R has a match
function which works similar to the Excel one.
myData$Match <- with(myData, Code4[match(Code2, Code3)] * !Code1)
myData
#-----
Element Code1 Code2 Code3 Code4 Match
1 A 0 1 0 0.0 1.1
2 A 0 2 0 0.0 1.2
3 C 0 1 0 0.0 1.1
4 A 0 3 0 0.0 NA
5 B 1 1 1 1.1 0.0
6 B 1 2 2 1.2 0.0
Same idea, but using dplyr
myData %>%
mutate(Match = Code4[match(Code2, Code3)] * !Code1)
Related Topics
Making a Stacked Bar Plot for Multiple Variables - Ggplot2 in R
How to Remove Empty Factors from Ggplot2 Facets
Specification of First and Last Tick Marks with Scale_X_Date
How to Display All X Labels in R Barplot
Subtract a Column in a Dataframe from Many Columns in R
Drop-Down Checkbox Input in Shiny
Define $ Right Parameter with a Variable in R
Changing Whisker Definition in Geom_Boxplot
R Shiny: Reactivevalues VS Reactive
Changing Facet Label to Math Formula in Ggplot2
How to Add a General Label to Facets in Ggplot2
How to Install an R Package from the Source Tarball on Windows
How to Round Up to the Nearest 10 (Or 100 or X)
Converting Nested List to Dataframe
How to Take Pairwise Parallel Maximum Between Two Vectors