R: loop through variable names in data.table using for-loop (and group them by variable)

Question

> colnames(fileIWantToAnalyze) 

[1] "variable_1a"     "variable_5b"                                  
[3] "variable_1b"     "variable_6a"                           
[5] "variable_2a"     "variable_6b"                           
[7] "variable_2b"     "variable_7a"                           
[9] "variable_3a"     "variable_7b"                           
[11] "variable_3b"    "variable_8a"        
[13] "variable_4a"    "variable_8b"       
[15] "variable_4b"    "variable_9a"            
[17] "variable_5a"    "variable_9b"            
[19] "GroupingColumn1"

I am not able to run the following code in R - throws this error:

Error in [.data.table(fileIWantToAnalyze, , .(mean1 = mean(get(attribute)), : The items in the 'by' or 'keyby' list are length (943026,1). Each must be length 943026; the same length as there are rows in x (after subsetting if i is provided).

"fileIWantToAnalyze" is a data.table

for(attribute in colnames(fileIWantToAnalyze)[c(1,3,5,7,9,11,13,15,17)]){
  fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
                      by = .(GroupingColumn1,sub("a", "b", attribute))]
}

This doesn't work too

for (attribute in colnames(fileIWantToAnalyze)[c(1,3,5,7,9,11,13,15,17)]){
  fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
    by = .(GroupingColumn1,attribute)]
}

The following code gives me the answer I am looking for - but I want to use a loop to generate outputs for many variables

fileIWantToAnalyze[,.(mean1 = mean(variable_1a),count1 = .N),
    by = .(GroupingColumn1,variable_1b))]

I believe the problem is how I am calling the the "attribute" in the 'by' command while grouping

Telling us what doesn't work while not telling what the goal is ... well, you should be able to understand our problem in understanding your problem. ..... Not to mention that fact that this does not have a minimal reproducible example making this a not very attractive problem to spend any time on. Sorry. Look at How to Ask, and minimal reproducible example and edit. — IRTFM
– IRTFM, Commented Aug 24, 2019 at 1:36

David Arenburg · Accepted Answer · 2019-08-25 08:20:28Z

Your problem is from the fact of how variables are interpreted by the the data.table function, although this might actually be an unintended bug.

Note the following dummy example to illustrate:

dt <- data.table(A = 1:3, b = 3:5, c = 7:5)
#Works:
for(i in names(dt))
  dt[,lapply(.SD, sum), by = i]
#doesnt work
for(i in names(dt))
  dt[,lapply(.SD, sum), by = .(i)]
#works
for(i in names(dt))
  dt[,lapply(.SD, sum), by = c(i)]

Basically it seems data.table doesn't check if each element of .(...) is a single character vector contained in the namespace of the table.

So an easy fix is to just use a character vector in the by argument instead. Below is a revisited version of your code.

for(attribute in colnames(fileIWantToAnalyze)[seq(1, 17, by = 2]){
  fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
                      #Note that "by" is now in a character vector.  
                      by = c("GroupingColumn1", sub("a", "b", attribute))]
}

Parfait · Accepted Answer · 2019-08-24 02:36:24Z

1

Consider reshaping your wide data to long format, usually the preferred method of most analytical methods (aggregating, plotting, modeling). With such an approach, you avoid complex looping. Plus, data.table has reshaping methods including melt and dcast.

melt_dt <- melt(fileIWantToAnalyze, 
                id.vars = c("GroupingColumn1"), 
                measure.vars = list(paste0("variable_", 1:9, "a"),
                                    paste0("variable_", 1:9, "b"))
                value.name = c("value_a", "value_b")
               )

agg_dt <- melt_dt[, .(mean_value=(value_a), count=.N), 
                  by=list(GroupingColumn1, value_b)][order(GroupingColumn1, value_b)]

answered Aug 24, 2019 at 2:36

Parfait

108k19 gold badges103 silver badges138 bronze badges

Collectives™ on Stack Overflow

R: loop through variable names in data.table using for-loop (and group them by variable)

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related