R – Undefined columns selected when subsetting data frame inside a function

r

Hi I have a data frame called "outcome" with a column called "pneumonia" and some other column like "State" and "Hospital.Name"

when I run in the command line

outcome <- read.csv("Assigment3/outcome-of-care-measures.csv", colClasses = "character")
temp <- subset(outcome, State =="NY", select=c(Hospital.Name, Pneumonia)

it works and it creates the temp data frame with 2 columns the Hospital.Name and Pneumonia.

but when I create a function that contains the same instruction

state is a value inside the state column, and outcome1 is just the column name

best <- function(state, outcome1) {
    outcome <- read.csv("Assigment3/outcome-of-care-measures.csv", colClasses = "character")  
    temp <- subset(outcome, State ==state, select=c(Hospital.Name, outcome1))
}

and I call the function:

best("NY","Pneumonia")

I get the error:

Error in [.data.frame`(x, r, vars, drop = drop) :
undefined columns selected

I know the problem is with the outcome1 variable, since when if I hardcode outcome1 in the above function, instead of passing it in as an argument, the function works as expected.

Best Answer

I think you need get around your outcome1 in your function definition, as you are passing a string rather than an object as your argument. With this example data:

outcome <- data.frame(Pneumonia = sample(0:1, size = 5, replace = TRUE),
                      State = c("NY", "NY", "NY", "CA", "CA"),
                      Hospital.Name = LETTERS[1:5]
                      )

And this modified function:

best <- function(df_, state_, var_) {
  subset(df_, State == state_, select = c(Hospital.Name, get(var_)))
}          

Now you can call it more or less as before:

> best(df_ = outcome, state_ = "NY", var_ = "Pneumonia")
  Hospital.Name Pneumonia
1             A         0
2             B         1
3             C         0