Using Strings to Identify Sequence of Column Names in R

Question

I am currently try to use pre-defined strings in order to identify multiple column names in R. To be more explicit, I am using the ave function to create identification variables for subgroups of a dataframe. The twist is that I want the identification variables to be flexible, in such a manner that I would just pass it as a generic string.

A sample code would be:

ids = with(df,ave(rep(1,nrow(df)),subcolumn1,subcolumn2,subcolumn3,FUN=seq_along))

I would like to run this code in the following fashion (code below does not work as expected):

subColumnsString = c("subcolumn1","subcolumn2","subcolumn3")
ids = with(df,ave(rep(1,nrow(df)),subColumnsString ,FUN=seq_along))

I tried something with eval, but still did not work:

subColumnsString = c("subcolumn1","subcolumn2","subcolumn3")
ids = with(df,ave(rep(1,nrow(df)),eval(parse(text=subColumnsString)),FUN=seq_along))

Any ideas? Thanks.

EDIT: Working code example of what I want:

df = mtcars
id_names = c("vs","am")
idDF_correct = transform(df,idItem = as.numeric(interaction(vs,am)))
idDF_wrong = cbind(df,ave(rep(1,nrow(df)),df[id_names],FUN=seq_along))

Note how in idDF_correct, the unique combinations are correctly mapped into unique values of idItem. In idDF_wrong this is not the case.

Zelazny7 · Accepted Answer · 2018-02-28 15:35:10Z

2

I think this achieves what you requested. Here I use the mtcars dataset that ships with R:

subColumnsString <- c("cyl","gear")

ids = with(mtcars, ave(rep(1,nrow(mtcars)), mtcars[subColumnsString], FUN=seq_along))

Just index your data.frame using the sub columns which returns a list that naturally works with ave

EDIT

ids = ave(rep(1,nrow(mtcars)), mtcars[subColumnsString], FUN=seq_along)

You can omit the with and just call plain 'ol ave, as G. Grothendieck, stated and you should also use their answer as it is much more general.

edited Feb 28, 2018 at 15:35

answered Feb 28, 2018 at 15:06

Zelazny7

40.7k18 gold badges72 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

G. Grothendieck Over a year ago

The with is not really used so it could be omitted.

user191919 Over a year ago

For some reason this is not creating the identification number as expected (the number repeats itself for different subgroups). My identification variables are strings, but that should not be a problem, should it? When using interaction, it works just fine: idDF = transform(df,id = as.numeric(interaction(subCol1,subCol2,subCol3)))

Zelazny7 Over a year ago

An example of your expected output would help.

user191919 Over a year ago

Edited the question with a mwe.

G. Grothendieck · Accepted Answer · 2018-02-28 15:27:20Z

1

This defines a function whose arguments are:

data, the input data frame
by, a character vector of column names in data
fun, a function to use in ave

Code--

Ave <- function(data, by, fun = seq_along) {
   do.call(function(...) ave(rep(1, nrow(data)), ..., FUN = fun), data[by])
}

# test 
Ave(CO2, c("Plant", "Treatment"), seq_along)

giving:

 [1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3
[39] 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6
[77] 7 1 2 3 4 5 6 7

edited Feb 28, 2018 at 15:27

answered Feb 28, 2018 at 15:06

G. Grothendieck

273k18 gold badges221 silver badges365 bronze badges

Collectives™ on Stack Overflow

Using Strings to Identify Sequence of Column Names in R

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related