R – Dplyr join on by=(a = b), where a and b are variables containing strings

dplyrr

I am trying to perform an inner join two tables using dplyr, and I think I'm getting tripped up by non-standard evaluation rules. When using the by=("a" = "b") argument, everything works as expected when "a" and "b" are actual strings. Here's a toy example that works:

library(dplyr)
data(iris)

inner_join(iris, iris, by=c("Sepal.Length" = "Sepal.Width"))

But let's say I was putting inner_join in a function:

library(dplyr)
data(iris)

myfn <- function(xname, yname) {
    data(iris)
    inner_join(iris, iris, by=c(xname = yname))
}

myfn("Sepal.Length", "Sepal.Width")

This returns the following error:

Error: cannot join on columns 'xname' x 'Sepal.Width': index out of bounds

I suspect there is some fancy expression, deparsing, quoting, or unquoting that I could do to make this work, but I'm a bit murky on those details.

Best Solution

You can use

myfn <- function(xname, yname) {
    data(iris)
    inner_join(iris, iris, by=setNames(yname, xname))
}

The suggested syntax in the ?inner_join documentation of

by = c("a"="b")   # same as by = c(a="b")

is slightly misleading because both those values aren't proper character values. You're actually created a named character vector. To dynamically set the values to the left of the equals sign is different from those on the right. You can use setNames() to set the names of the vector dynamically.

Related Question