2

I'm trying to subset a dataframe within a function using a mixture of fixed variables and some variables which are created within the function (I only know the variable names, but cannot vectorise them beforehand). Here is a simplified example:

a<-c(1,2,3,4)
b<-c(2,2,3,5)
c<-c(1,1,2,2)
D<-data.frame(a,b,c)

subbing<-function(Data,GroupVar,condition){
  g=Data$c+3
  h=Data$c+1
  NewD<-data.frame(a,b,g,h)
  subset(NewD,select=c(a,b,GroupVar),GroupVar%in%condition)
}

Keep in mind that in my application I cannot compute g and h outside of the function. Sometimes I'll want to make a selection according to the values of h (as above) and other times I'll want to use g. There's also the possibility I may want to use both, but even just being able to subset using 1 would be great.

subbing(D,GroupVar=h,condition=5)

This returns an error saying that the object h cannot be found. I've tried to amend subset using as.formula and all sorts of things but I've failed every single time.

Besides the ease of the function there is a further reason why I'd like to use subset.

In the function I'm actually working on I use subset twice. The first time it's the simple subset function. It's just been pointed out below that another blog explored how it's probably best to use the good old data[colnames()=="g",]. Thanks for the suggestion, I'll have a go.

There is however another issue. I also use subset (or rather a variation) in my function because I'm dealing with several complex design surveys (see package survey), so subset.survey.design allows you to get the right variance estimation for subgroups. If I selected my group using [] I would get the wrong s.e. for my parameters, so I guess this is quite an important issue.

Thank you

1
  • 2
    You should read this question (it could possibly be considered a duplicate, though it's kind of a stretch). Commented Oct 30, 2012 at 18:48

1 Answer 1

4

It's happening right as the function is trying to define GroupVar in the beginning. R is looking for the object h by itself (not within the dataframe).

The best thing to do is refer to the column names in quotes in the subset function. But of course, then you'd have to sidestep the condition part:

subbing <- function(Data, GroupVar, condition) {
   ....
   DF <- subset(Data, select=c("a","b", GroupVar))
   DF <- DF[DF[,3] %in% condition,]
}

That will do the trick, although it can be annoying to have one data frame indexing inside another.

Sign up to request clarification or add additional context in comments.

2 Comments

For safety, I would probably also ditch subset entirely and move the select piece into the final line as well. (And I was about to ask the OP why they create NewD but then return a subset of Data; typo, perhaps...)
@joran - That's another good way to do it. You can just have DF <- DF[DF[,GroupVar %in% condition],c("a","b",GroupVar)]. I tend to prefer subset since it has its own environment, but in these cases the bracket notation minimizes the potential for error. As a side note, using with(DF, DF[...]) eliminates the need for the cumbersome $ operator, which is sometimes a good option.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.