4

At the moment, when trying to select columns not directly by their name, but rather with grepl, one has to add an additional call to the data.table in the grepl command using colnames(). Is it possible to directly implement this functionality into data.table, so that one could use grepl directly and the column names are automatically taken from the current data.table.

dt <- structure(list(Global.Company.Key = c(1380L, 1380L, 1380L, 1380L, 1380L)
                         , Calendar.Data.Year.and.Quarter = structure(c(2000, 2000, 2000, 2000, 2000), class = "yearqtr")
                         , Current.Assets.Total = c(2218, 2218, 2218, 2218, 2218)
                         , DRILL_TYPE = c("U", "D", "V", "H", "U")
                         , DI.Oil.Prod.Quarter = c(18395.6792379842, 1301949.24041659, 235.311086392291, 27261.8049684835, 4719.27956989249)
                         , DI.Gas.Prod.Quarter = c(1600471.27107983, 4882347.22928982, 2611.60215053765, 9634.76418242493, 27648.276603634)), .Names = c("Global.Company.Key", "Calendar.Data.Year.and.Quarter", "Current.Assets.Total", "DRILL_TYPE", "DI.Oil.Prod.Quarter",  "DI.Gas.Prod.Quarter"), row.names = c(NA, -5L), class = c("data.table",  "data.frame"), sorted = c("Global.Company.Key",  "Calendar.Data.Year.and.Quarter"))

One example for a selection based with grepl:

dt[, grepl(glob2rx("Current.Assets*"), colnames(dt)), with = FALSE]

It would be nice, if something like this would be possible instead:

dt[, grepl(glob2rx("Current.Assets*")), with = FALSE]
7
  • can you specify what is glob2rx? Commented Mar 24, 2017 at 12:54
  • glob2rx is just my way to get around regex, because it allows to use * as wildcard. Commented Mar 24, 2017 at 12:57
  • 2
    SO is for programming questions, while you're asking for a new feature. The FR is already posted, btw: github.com/Rdatatable/data.table/issues/1878 and github.com/Rdatatable/data.table/issues/1786 Commented Mar 24, 2017 at 13:12
  • I didn't find it, sorry. Should I remove the question or keep it as a reference to the feature request? Commented Mar 24, 2017 at 13:15
  • Up to you. I think it's fine either way. If you want to keep it as a ref, you could drop a comment on one of the issues like "please update / answer this SO post if this is implemented". Commented Mar 24, 2017 at 13:18

1 Answer 1

3
library(data.table)

dt <- data.table(CurrentAssets=rnorm(10),FixedAssets=rnorm(10), CurrentLiabilities=rnorm(10),Capital=rnorm(10))

dt

##    CurrentAssets FixedAssets CurrentLiabilities    Capital
## 1:   -1.27610992  -0.2989316         0.20688252  0.6504636
## 2:    0.01065576   1.3088539         1.22533006  0.7550024
## 3:    0.53308022  -1.3459419        -0.99627142 -0.7589336
## 4:    0.30737237  -0.4291044         2.20328357  0.2157515
## 5:   -1.37391990   0.8581097        -0.08161687  0.7067757
## 6:    0.28664468   0.2308479         0.38675487 -0.3467660
## 7:   -0.22902454   1.3365470         0.10128697  0.3246363
## 8:    0.05159736  -2.0702850         0.78404464 -1.7612696
## 9:    0.51817847  -0.8365225        -0.04778573  0.6170114
##10:    0.50859575   0.5683021        -0.13780167 -0.9243434

Just some random columns. The accounts don't balance. You can define the columns, then do ...

colnames <- c("CurrentAssets","FixedAssets", "CurrentLiabilities","Capital")
dt[,.SD,.SDcols=grep("Assets",colnames,value =TRUE)]

If you don't want to type colnames and value=TRUE all the time you can build your own function like the following.

mygrep <- function(x){
    colnames <- c("CurrentAssets","FixedAssets", "CurrentLiabilities","Capital")
    grep(x,colnames,value=TRUE)
}

Now the drawback is of mygrep is that you need to put the column name manually. An improvement would be to pass the data.table to the function.

mygrep <- function(x,dt){
    colnames <- colnames(dt)
    grep(x,colnames,value=TRUE)
}

dt[,.SD,.SDcols=mygrep("Assets",dt)]

Edit Just found another way to do the same thing using macro in R. You will need the package gtools to use macros.

We define a macro subdt.

library(gtools)
  subdt <- defmacro(dt,pattern,expr={
   dt[,.SD,.SDcols=grep(pattern,colnames(dt),value=TRUE)]
    })

then do

subdt(dt,"Assets")

macros are powerful as they write the code before evaluation.

Sign up to request clarification or add additional context in comments.

4 Comments

Made my code much more readable, although it's not the final solution, probably the best available.
I wonder if @DJJ could provide a brief explanation why (example on mtcars) mtcars_dt[,.SD,.SDcols = grep("m", colnames(mtcars_dt), value =TRUE)] needs to be used for data.table while for data.frame the following works mtcars[, grepl("m", colnames(mtcars))] (but not for data.table)? This could helps others understand further the differences in data.table and data.frame. I recently tackled the same issue and your answer was the solution but I still wonder how it works in data.table and why base approach had to be adapted.
@gofraidh I would suggest posting a question on SO. There are people with more knowledge than me about data.table. If you have yet read it here is a good starting point. For data.table, a solution would be the following. tmp <- grepl("m", colnames(aa)); aa[,..tmp]
Now, data.table has also the patterns key. aa<- as.data.table(mtcars); aa[,.SD,.SDcols=patterns("m")]. see ?data.table for more details.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.