2

I am currently trying to make my code dryer by rewriting some parts with the help of functions. One of the functions I am using is:

datasetperuniversity<-function(university,year){assign(paste("data",university,sep=""),subset(get(paste("originaldata",year,sep="")),get(paste("allcollaboration",university,sep=""))==1))}

Executing the function datasetperuniversity("Harvard","2000") would result within the function in something like this:

dataHarvard=subset(originaldata2000,allcollaborationHarvard==1)

The function runs nearly perfectly, except that it does not store a the results in dataHarvard. I read that this is normal in functions, and using the <<- instead of the = could solve this issue, however since I am making use of the assign function this is not really possible, since the = is just the outcome of the assign function.

Here some data:

sales = c(2, 3, 5,6) 
numberofemployees = c(1, 9, 20,12) 
allcollaborationHarvard = c(0, 1, 0,1) 
originaldata = data.frame(sales, numberofemployees, allcollaborationHarvard)
2
  • I guess if you rearrange your data, it will be easier. Don't carry around originaldata2000, originaldata2001, etc -- just put them together in one table with a year column. And if your allcolaboration[uni] cols are mutually exclusive, use one categorical column instead of dummies. For more on this line of thinking if you're interested: jstatsoft.org/article/view/v059i10 Commented May 2, 2018 at 16:22
  • 1
    @Frank Thanks for this very clear and easy to implement suggestion. Although I will use this method for now, I keep wondering whether my original question is answerable for cases were merging all the datsets is not preferable Commented May 3, 2018 at 7:11

1 Answer 1

1

Generally, it's best not to embed data/a variable into the name of an object. So instead of using assign to dataHarvard, make a list data with an element called "Harvard":

# enumerate unis, attaching names for lapply to use
unis = setNames(, "Harvard")

# make a table for each subset with lapply
data = lapply(unis, function(x) 
  originaldata[originaldata[[ paste0("allcollaboration", x) ]] == 1, ]
)

which gives

> data
$Harvard
  sales numberofemployees allcollaborationHarvard
2     3                 9                       1
4     6                12                       1

As seen here, you can use DF[["column name"]] to access a column instead of get as in the OP. Also, see the note in ?subset:

Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

Generally, it's also better not to embed data in column names if possible. If the allcollaboration* columns are mutually exclusive, they can be collapsed to a single categorical variable with values like "Harvard", "Yale", etc. Alternately, it might make sense to put the data in long form.

For more guidance on arranging data, I recommend Hadley Wickham's tidy data paper.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.