Match Function in R

R match function with conditions

Well, it is easier to help if you share a small but reproducible example of your data. I created a sample dataset to demonstrate the solution.

Here's the data first.

Df1 <- data.frame(ID = 1:5, 
Status = c('Injured', 'Dead', 'Dead', 'Alive', 'Injured'))

Df2 <- data.frame(Bird.ID = c(1, 3, 5))

Df1

# ID Status
#1 1 Injured
#2 2 Dead
#3 3 Dead
#4 4 Alive
#5 5 Injured

Df2
# Bird.ID
#1 1
#2 3
#3 5

Solution -

Df1$Status[Df1$ID %in% Df2$Bird.ID & Df1$Status != "Dead"] <- 'Alive'
Df1

# ID Status
#1 1 Alive
#2 2 Dead
#3 3 Dead
#4 4 Alive
#5 5 Alive

Matching function in R: match.fun vs deparse(substitute()) vs supplying function directly

First, documentation! Here are relevant sections from ?match.fun:

When called inside functions that take a function as argument, extract the desired function object while avoiding undesired matching to objects of other types.

If FUN is a function, it is returned. If it is a symbol (for example, enclosed in backquotes) or a character vector of length one, it will be looked up using get in the environment of the parent of the caller.

Thus, match.fun has two main benefits:

  1. It gives users the option of passing strings and symbols instead of functions.
  2. It provides type safety, as the return value is always a function. This makes your source code not only more robust, but also more transparent: it is not necessary to read the documentation of your fun2 to know that its argument fun must specify a function.

And it provides these benefits at virtually no cost to performance:

x1 <- mean
x2 <- "mean"
x3 <- quote(mean)
microbenchmark::microbenchmark(match.fun(x1), match.fun(x2), match.fun(x3), times = 1000L)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# match.fun(x1) 287 328 362.481 328 328 1681 1000
# match.fun(x2) 1599 1681 1820.892 1681 1763 7544 1000
# match.fun(x3) 1599 1640 1783.049 1681 1722 7339 1000

For these reasons, it is almost always better to validate with match.fun before trying to evaluate a function call (as in your fun2) than to wait and hope that a call can be evaluated (as in your fun1 and fun3). This principle holds even if your function is not exported and even if you never pass strings or symbols, because transparency (see 2) makes your source code easier to read and maintain.

Your fun3 is unique in that it allows users to pass unevaluated expressions, but that approach is problematic for multiple reasons:

  1. It will not work as expected inside of other functions; see @Hong Ooi's comment/answer.
  2. You cannot pass functions accessed with a double or triple colon operator, or anonymous functions, or, more generally, any expression evaluating indirectly to a function:
    fun3(base::mean, 1:10)
    # Error in `base::mean`(1:10) : could not find function "base::mean"
    fun3(function(x) mean(x), 1:10)
    # Error in `function(x) mean(x)`(1:10) :
    # could not find function "function(x) mean(x)"
    fun3(match.fun(mean), 1:10)
    # Error in `match.fun(mean)`(1:10) :
    # could not find function "match.fun(mean)"
  3. Even if it does work as you expect, it is mostly smoke and mirrors: if the result of deparse(substitute(fun)) is a string naming a function accessible from the calling environment, then there was no need for deparse(substitute(fun)) in the first place, because fun would have evaluated to that function anyway. It does extra work for nothing:
    microbenchmark::microbenchmark(fun1(mean, 1:10), fun3(mean, 1:10), times = 1000L)
    # Unit: microseconds
    # expr min lq mean median uq max neval
    # fun1(mean, 1:10) 2.009 2.378 2.700055 2.460 2.788 14.350 1000
    # fun3(mean, 1:10) 9.020 10.127 10.991813 10.701 11.480 52.398 1000

In summary, it is good practice to use match.fun whenever you expect functions as arguments. You might avoid match.fun if you want to accept functions but not strings or symbols, but in that situation it would still be good practice to have a test:

function(FUN, ...) {
if (!is.function(FUN)) {
stop("oops")
}
## do stuff
}

How to use a match function with a mutate function?

An alternative way to do it could be like this:

library(tidyverse)

df1 %>%
mutate("Flag" = case_when(
ID %in% Status1$ID ~ "Status1",
ID %in% Status2$ID ~ "Status2",
TRUE ~ Status
))
#> ID Status Flag
#> 1 1 N N
#> 2 2 Y Status1
#> 3 3 Y Status1
#> 4 4 N N
#> 5 5 Y Status2

Created on 2022-01-07 by the reprex package (v2.0.1)

Data:

df1 <- data.frame(
ID = c(1, 2, 3, 4, 5),
Status = c("N", "Y", "Y", "N", "Y")
)
Status1 <- data.frame(ID = c(2, 3))
Status2 <- data.frame(ID = c(5))

MATCH function in r

First you have typos in your example. Secondly, the assignment of 'list1$test1value' should have an '[i]' added to it to not save over each round. There should also not be an '[i]' added to list2$id since you want to search the entire vector for the lookup.

for (i in 1:length(list1)) { 
list1$test1value[i] <- list2$test[match(list1$id[i], list2$id,
nomatch = NA_integer_, incomparables = NULL)] }

The code works, but there is no reason for any loops here. You are showing a lack of understanding in how R operates. The below code does the exact same thing much faster.

list1$test1value <- list2$test[match(list1$id, list2$id)]

R is built so that you do not have to hold its hand and instruct it how to go through each element of the vector. match will automatically iterate through each member one by one and look it up in the other vector for you. It will also assign the result in an orderly way in the dataset.

I will close this as a duplicate because as others suggested, merge is perfect for this.

merge(list1, list2[c("id", "test")], all.x=TRUE)
# id age name test
#1 1 40 danny 100
#2 2 16 nora NA
#3 3 35 james NA
#4 4 21 ben 55

Get indices of matches with a column in a second data.table

Using .EACHI and adding the resulting list column by reference.

dt2[ , res := dt1[ , i := .I][.SD, on = .(firstName), .(.(i)), by = .EACHI]$V1]
# lid firstName res
# 1: 1 Maria NA
# 2: 2 Jim 1,4
# 3: 3 Jack NA
# 4: 4 Anne 3,5

cannot match using the match function and provides NA error

When you match 676 with Values it returns NA and you cannot subset that as an index from as1$Values. If you want to use match try :

inds <- !is.na(match(as2$Values_a, as1$Values))
as1$Values[inds] <- as2$Values_b[inds]

You could also join the data :

library(dplyr)

left_join(as1, as2, by = c('Values' = 'Values_a')) %>%
mutate(Values = coalesce(Values_b, Values)) %>%
select(names(as1))

# ID pID Values
#1 1 21 544
#2 2 22 33
#3 3 23 45
#4 6 26 12

How to replicate Excel's index matching formula in R using dplyr?

Base R has a match function which works similar to the Excel one.

myData$Match <- with(myData, Code4[match(Code2, Code3)] * !Code1)

myData
#-----
Element Code1 Code2 Code3 Code4 Match
1 A 0 1 0 0.0 1.1
2 A 0 2 0 0.0 1.2
3 C 0 1 0 0.0 1.1
4 A 0 3 0 0.0 NA
5 B 1 1 1 1.1 0.0
6 B 1 2 2 1.2 0.0

Same idea, but using dplyr

myData %>%
mutate(Match = Code4[match(Code2, Code3)] * !Code1)


Related Topics



Leave a reply



Submit