4

I'm new to function writing so hopefully the below makes some sense.

I want to create a function which takes some arguments, which will be used to subset a data.frame. I have searched across the forums and found these Q&As interesting, but haven't been able to answer my question from the discussions:

The function I want to create will take a df, a column name, and a value to match in the rows of the column name. Here's my attempt which I can see to be wrong:

x <- data.frame("col1"=c("email","search","direct"),
            "col2"=c("direct","email","direct"),
            "col3"=c(10,15,27))

fun <- function(df,col,val) {
  result <- subset(df, col==val)
  return(result)
}

I want to pass in the df, x. A column name, let's say "col2". A value, let's say "email". My attempt to do so returns a 0-length df.

fun(x,"col2","email")

Clearly I'm doing something wrong... can anyone help?

2
  • you should have a read at this post to learn a bit more about issues with using subset inside a function. Commented Jun 10, 2013 at 11:32
  • I notice you do not use a lot of spaces in your code, e.g. function(df,col,etc) -> function(df, col, etc) or col==val -> col == val. Adding spaces makes your code easier to read, less intimidating. Commented Jun 10, 2013 at 11:34

1 Answer 1

3

You would want to do somehting like :

df[df[[col_name]] == value,]

the function then becomes:

fun <- function(df, col_name, value) {
  df[df[[col_name]] == value,]
}
fun(x, 'col2', 'email')
    col1  col2 col3
2 search email   15

and if you want to take into account NA values in the logical vector:

fun <- function(df, col_name, value) {
  logical_vector = df[[col_name]] == value
  logical_vector[is.na(logical_vector)] = FALSE
  df[logical_vector, drop = FALSE]
}

Why your example not works is because subset does not look inside the value of col. In stead, it will look for a column called col. I suspect the val parameter is also not correctly parsed. This is one of the reasons not to use subset in non-interactive mode, i.e. in anything else than an interactive R console.

Sign up to request clarification or add additional context in comments.

4 Comments

Great - that makes total sense. Thanks for such a quick response Paul.
you might want to take care of NAs if you want to mimic subset using [.data.frame. subset drops those row indices for which the logical expression evaluates to NA.
And you should add drop = FALSE in case the user passes in a 1 col df.
@hadley I also added that improvement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.