2

I have a data frame called "Region_Data" which I have created by performing some functions on it.

I want to take this data frame called "Region_Data" and use it an input and I want to subset it using the following function that I created. The function should produce the subset data frame:

Region_Analysis_Function <- function(Input_Region){
      Subset_Region_Data = subset(Region_Data, Region == "Input_Region" )
      Subset_Region_Data
    }

However, when I create this function and then execute it using:

Region_Analysis_Fuction("North West") 

I get 0 observations when I execute this code (though I know that there are xx number of observations in the data frame.)

I read that there is something called global / local environment, but I'm not really clear on that.

How do I solve this issue? Thank you so much in advance!!

2
  • 3
    Try using Region == Input_Region (no quotes). Or better yet, you may want to use Region %in% Input_Region in case a non-atomic is passed to the function. Commented May 20, 2015 at 16:29
  • Also read this on the use of subset Commented May 20, 2015 at 16:51

1 Answer 1

3

When you try to subset your data using subset(Region_Data, Region == "Input_Region" ), "Input_Region" is being interpreted as a string literal, rather than being evaluated to the value it represents. This means that unless the column Input_Region in your object Region_Data contains some rows with the value "Input_Region", your function will return a zero-row subset. Removing the quotes will solve this, and changing == to %in% will make your function more generalized. Consider the following data set,

mydf <- data.frame(
  x = 1:5,
  y = rnorm(5),
  z = letters[1:5])
##
R> mydf
  x          y z
1 1 -0.4015449 a
2 2  0.4875468 b
3 3  0.9375762 c
4 4 -0.7464501 d
5 5  0.8802209 e

and the following 3 functions,

qfoo <- function(Z) {
  subset(mydf, z == "Z")
}
foo <- function(Z) {
  subset(mydf, z == Z)
}
##
bar <- function(Z) {
  subset(mydf, z %in% Z)
}

where qfoo represents the approach used in your question, foo implements the first change I noted, and bar implements both changes.

The second two functions will work when the input value is a scalar,

R> qfoo("c")
[1] x y z
<0 rows> (or 0-length row.names)
##
R> foo("c")
  x         y z
3 3 0.9375762 c
##
R> bar("c")
  x         y z
3 3 0.9375762 c

but only the third will work if it is a vector:

R> foo(c("a","c"))
  x          y z
1 1 -0.4015449 a
Warning messages:
1: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
2: In `==.default`(z, Z) :
  longer object length is not a multiple of shorter object length
##
R> bar(c("a","c"))
  x          y z
1 1 -0.4015449 a
3 3  0.9375762 c
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.