Dplyr Join on By=(A = B), Where a and B Are Variables Containing Strings

Dplyr join on by=(a = b), where a and b are variables containing strings?

You can use

myfn <- function(xname, yname) {
data(iris)
inner_join(iris, iris, by=setNames(yname, xname))
}

The suggested syntax in the ?inner_join documentation of

by = c("a"="b")   # same as by = c(a="b")

is slightly misleading because both those values aren't proper character values. You're actually created a named character vector. To dynamically set the values to the left of the equals sign is different from those on the right. You can use setNames() to set the names of the vector dynamically.

dplyr `left_join()` does not work with a character objects as the LHS variable

Hy, the 'left_join' function needs a named character vector in the by argument. In your second try:

firstname <- "name1"
left_join(orig, tojoin, by = c(firstname = "name2"))

You set the name of the character vector to firstname which does not work for the join.
For solving this you can first generate a named character vector and pass it then to the by argument of the join function

firstname <- "name1"
join_cols = c("name2")
names(join_cols) <- firstname

dplyr::left_join(orig, tojoin, by = join_cols)

dplyr join two tables within a function where one variable name is an argument to the function

Try:

df1 <- data.frame(gender = rep(c('M', 'F'), 5), var1 = letters[1:10])

new_join <- function(df, sexvar){

df2 <- data.frame(sex = rep(c('M', 'F'), 10), var2 = letters[20:1])

join_vars <- c('sex')
names(join_vars) <- sexvar

left_join(df, df2, by = join_vars)
}

new_join(df1, 'gender')

I'm sure there's a more elegant way of getting this to work using lazy evaluation, etc., but this should get you up-and-running in the meantime.

dplyr - evaluating a string in a function left_join(y, by = c(x=value))

We may use a named vector with setNames

f_show_labels_with_codes <- function(x) {

print(x)

df1 %>%
left_join(code_label, by = setNames('code', x))


}

-testing

> f_show_labels_with_codes(x = "team")  
[1] "team"
df1_id team year code_id label
1 1 1 2014 1 team_A
2 2 2 2014 2 team_B
3 3 1 2009 1 team_A
4 4 6 2020 6 team_F
5 5 4 2015 4 team_D
6 6 1 2017 1 team_A

Using dplyr::left_join inside a custom function

Note that c("state_symbol" = "S_Symb","city" = "municip") actually creates a named vector which you can create with setNames to use it inside a function.

my_join <- function(tab_1,
tab_2,
df_1_city_col,
df_1_state_col,
df_2_city_col,
df_2_state_col) {

output <- dplyr::left_join(x = tab_1,
y = tab_2,
by = setNames(c(df_2_city_col, df_2_state_col),
c(df_1_city_col, df_1_state_col)))
return(output)
}

my_join(tab_1 = df_1,
tab_2 = df_2,
df_1_city_col = 'city',
df_1_state_col = 'state_symbol',
df_2_city_col = 'municip',
df_2_state_col = 'S_Symb')

# state_symbol city collected_data population
#1 MG Sao Paulo red 123
#2 SP Sao Paulo green 789
#3 BA Brumado blue 456

In base R, you can use by.x and by.y to specify the columns to merge.

my_join <- function(tab_1,
tab_2,
df_1_city_col,
df_1_state_col,
df_2_city_col,
df_2_state_col) {

output <- merge(tab_1, tab_2,
by.x = c(df_1_city_col, df_1_state_col),
by.y = c(df_2_city_col, df_2_state_col),
all.x = TRUE)

return(output)
}

Writing function with dplyr's left_join

Find the function below:

glue_sth <- function(df, variable) {

df %>%
dplyr::rename('join'=variable) %>%
dplyr::left_join(df_aux, by = c('join' = "z"))
}

Here, I have essentially renamed the column so that we do not have to go through the entire eval(parse()) route.

Alternative as described in comments:

glue_sth <- function(df, variable) {

df %>%
dplyr::left_join(df_aux, by = setNames("z",variable))
}

Let me know if it works.

left_join using a charater vector in the by argument in R

We can do this easily with setNames

out1 <- left_join(DF,DF2, by = setNames("aa", DFNames[1]))

-checking the output

out2 <- left_join(DF, DF2, by = c("a" = "aa"))
identical(out1, out2)
#[1] TRUE

full_join by variable as column names

You can use rename_, i.e.,

library(dplyr)

full_join(df1, rename_(df2, .dots = setNames(col2, col1)))

which gives,

#Joining, by = c("a", "b")
a b
1 1 1
2 2 2
3 3 3

Posting alternatives as per @akrun and @mt1022 comments,

#akrun
full_join(df1, rename_at(df2, .vars = col2, funs(paste0(col1))))
full_join(df1, rename(df2, !!(col1) := !!rlang::sym(col2)))

#mt1022
full_join(df1, rename_at(df2, col2, ~col1))

Left join missing left hand side

We can use setNames

library(dplyr)
band_members$id <- 1:3
band_instruments$id_num <- c(2,3,1)

xid='id'
xid1='name'
yid='id_num'
yid1='plays'

band_members %>% left_join(band_instruments, by=setNames(nm=c(xid,xid1),c(yid,yid1)))

# A tibble: 3 x 4
name band id plays
<chr> <chr> <dbl> <chr>
1 Mick Stones 1 NA
2 John Beatles 2 guitar
3 Paul Beatles 3 bass

PS: This answer based on @MrFlick answer here

Using strings as arguments in custom dplyr function using non-standard evaluation

You can either use sym to turn "y" into a symbol or parse_expr to parse it into an expression, then unquote it using !!:

library(rlang)

testFun(data.frame(x = c("a", "b", "c"), y = 1:3), !!sym(myVar))

testFun(data.frame(x = c("a", "b", "c"), y = 1:3), !!parse_expr(myVar))

Result:

  x   y
1 a 0
2 b 100
3 c 200

Check my answer in this question for explanation of difference between sym and parse_expr.



Related Topics



Leave a reply



Submit