Subset dataframe with list of columns in R

Question

I want to select all columns in my dataframe which I have stored in a string variable. For example:

v1 <- rnorm(100)
v2 <- rnorm(100)
v3 <- rnorm(100)
df <- data.frame(v1,v2,v3)

I want to accomplish the following:

df[,c('v1','v2')]

But I want to use a variable instead of (c('v1', 'v2'))(these all fail):

select.me <- "'v1','v2'"
df[,select.me]
df[,c(select.me)]
df[,c(paste(select.me,sep=''))]

Thanks for help with a simple question,

IRTFM · Accepted Answer · 2012-12-01 01:35:47Z

22

The great irony here is that when you said "I want to do this" the first expression should have succeeded,

df[,c('v1','v2')]
> str( df[,c('v1','v2')] )
'data.frame':   100 obs. of  2 variables:
 $ v1: num  -0.3347 0.2113 0.9775 -0.0151 -1.8544 ...
 $ v2: num  -1.396 -0.95 -1.254 0.822 0.141 ...

whereas all the later attempts would fail. I later realized that you didn't know that you could use select.me <- c('v1','v2') ; df[ , select.me]. You could also use these forms which might be safer in some instances:

df[ , names(df) %in% select.me] # logical indexing
df[ , grep(select.me, names(df) ) ]  # numeric indexing
df[ , grepl(select.me, names(df) ) ]  # logical indexing

Any of those can be used with negation( !logical ) or minus ( -numeric) to retrieve the complement, whereas you cannot use character indexing with negation. If you wanted to go down one level in understandability and were willing to change the select.me values to a valid R expression you could do this:

select.me <- "c('v1','v2')"
df[ , eval(parse(text=select.me)) ]

Not that I recommend this... just to let you know that such is possible after you "learn to walk". It would also have been possible (although rather baroque) using your original quoted string to pull out the information (although I think this just illustrates why your first version is superior):

select.me <- "'v1','v2'"
df [ , scan(textConnection(select.me), what="", sep=",") ]
> str( df [ , scan(textConnection(select.me), what="", sep=",") ] )
Read 2 items
'data.frame':   100 obs. of  2 variables:
 $ v1: num  -0.3347 0.2113 0.9775 -0.0151 -1.8544 ...
 $ v2: num  -1.396 -0.95 -1.254 0.822 0.141 ...

edited Dec 1, 2012 at 1:35

answered Nov 30, 2012 at 1:13

IRTFM

264k22 gold badges381 silver badges503 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Matthew Plourde Over a year ago

+1 beat me to eval(parse(...)). scan has a text argument, btw.

IRTFM Over a year ago

Hmmm. Right you are: scan(text=select.me, what="", sep=",") ...Is that 'text' argument how read.table handles it's text argument now? Must be. And why doesn't readLines accept a 'text' argument?

IRTFM Over a year ago

They added a "text" formal and check to see of "file" is missing. Seems that could have been done with readLines, too.

mnel · Accepted Answer · 2012-11-30 01:10:56Z

13

This is basic R sytnax, perhaps you need to read the introductory manual

select.me <- c('v1','v2')
df[,select.me]

answered Nov 30, 2012 at 1:10

mnel

116k28 gold badges269 silver badges255 bronze badges

Collectives™ on Stack Overflow

Subset dataframe with list of columns in R

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related