0

I am currently try to use pre-defined strings in order to identify multiple column names in R. To be more explicit, I am using the ave function to create identification variables for subgroups of a dataframe. The twist is that I want the identification variables to be flexible, in such a manner that I would just pass it as a generic string.

A sample code would be:

ids = with(df,ave(rep(1,nrow(df)),subcolumn1,subcolumn2,subcolumn3,FUN=seq_along))

I would like to run this code in the following fashion (code below does not work as expected):

subColumnsString = c("subcolumn1","subcolumn2","subcolumn3")
ids = with(df,ave(rep(1,nrow(df)),subColumnsString ,FUN=seq_along))

I tried something with eval, but still did not work:

subColumnsString = c("subcolumn1","subcolumn2","subcolumn3")
ids = with(df,ave(rep(1,nrow(df)),eval(parse(text=subColumnsString)),FUN=seq_along))

Any ideas? Thanks.

EDIT: Working code example of what I want:

df = mtcars
id_names = c("vs","am")
idDF_correct = transform(df,idItem = as.numeric(interaction(vs,am)))
idDF_wrong = cbind(df,ave(rep(1,nrow(df)),df[id_names],FUN=seq_along))

Note how in idDF_correct, the unique combinations are correctly mapped into unique values of idItem. In idDF_wrong this is not the case.

2 Answers 2

2

I think this achieves what you requested. Here I use the mtcars dataset that ships with R:

subColumnsString <- c("cyl","gear")

ids = with(mtcars, ave(rep(1,nrow(mtcars)), mtcars[subColumnsString], FUN=seq_along))

Just index your data.frame using the sub columns which returns a list that naturally works with ave

EDIT

ids = ave(rep(1,nrow(mtcars)), mtcars[subColumnsString], FUN=seq_along)

You can omit the with and just call plain 'ol ave, as G. Grothendieck, stated and you should also use their answer as it is much more general.

Sign up to request clarification or add additional context in comments.

4 Comments

The with is not really used so it could be omitted.
For some reason this is not creating the identification number as expected (the number repeats itself for different subgroups). My identification variables are strings, but that should not be a problem, should it? When using interaction, it works just fine: idDF = transform(df,id = as.numeric(interaction(subCol1,subCol2,subCol3)))
An example of your expected output would help.
Edited the question with a mwe.
1

This defines a function whose arguments are:

  • data, the input data frame
  • by, a character vector of column names in data
  • fun, a function to use in ave

Code--

Ave <- function(data, by, fun = seq_along) {
   do.call(function(...) ave(rep(1, nrow(data)), ..., FUN = fun), data[by])
}

# test 
Ave(CO2, c("Plant", "Treatment"), seq_along)

giving:

 [1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3
[39] 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6
[77] 7 1 2 3 4 5 6 7

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.