2
> colnames(fileIWantToAnalyze) 

[1] "variable_1a"     "variable_5b"                                  
[3] "variable_1b"     "variable_6a"                           
[5] "variable_2a"     "variable_6b"                           
[7] "variable_2b"     "variable_7a"                           
[9] "variable_3a"     "variable_7b"                           
[11] "variable_3b"    "variable_8a"        
[13] "variable_4a"    "variable_8b"       
[15] "variable_4b"    "variable_9a"            
[17] "variable_5a"    "variable_9b"            
[19] "GroupingColumn1"

I am not able to run the following code in R - throws this error:

Error in [.data.table(fileIWantToAnalyze, , .(mean1 = mean(get(attribute)), : The items in the 'by' or 'keyby' list are length (943026,1). Each must be length 943026; the same length as there are rows in x (after subsetting if i is provided).

"fileIWantToAnalyze" is a data.table
for(attribute in colnames(fileIWantToAnalyze)[c(1,3,5,7,9,11,13,15,17)]){
  fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
                      by = .(GroupingColumn1,sub("a", "b", attribute))]
}

This doesn't work too

for (attribute in colnames(fileIWantToAnalyze)[c(1,3,5,7,9,11,13,15,17)]){
  fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
    by = .(GroupingColumn1,attribute)]
}

The following code gives me the answer I am looking for - but I want to use a loop to generate outputs for many variables

fileIWantToAnalyze[,.(mean1 = mean(variable_1a),count1 = .N),
    by = .(GroupingColumn1,variable_1b))]

I believe the problem is how I am calling the the "attribute" in the 'by' command while grouping

1
  • Telling us what doesn't work while not telling what the goal is ... well, you should be able to understand our problem in understanding your problem. ..... Not to mention that fact that this does not have a minimal reproducible example making this a not very attractive problem to spend any time on. Sorry. Look at How to Ask, and minimal reproducible example and edit. Commented Aug 24, 2019 at 1:36

2 Answers 2

2

Your problem is from the fact of how variables are interpreted by the the data.table function, although this might actually be an unintended bug.

Note the following dummy example to illustrate:

dt <- data.table(A = 1:3, b = 3:5, c = 7:5)
#Works:
for(i in names(dt))
  dt[,lapply(.SD, sum), by = i]
#doesnt work
for(i in names(dt))
  dt[,lapply(.SD, sum), by = .(i)]
#works
for(i in names(dt))
  dt[,lapply(.SD, sum), by = c(i)]

Basically it seems data.table doesn't check if each element of .(...) is a single character vector contained in the namespace of the table.

So an easy fix is to just use a character vector in the by argument instead. Below is a revisited version of your code.

for(attribute in colnames(fileIWantToAnalyze)[seq(1, 17, by = 2]){
  fileIWantToAnalyze[,.(mean1 = mean(get(attribute)),count1 = .N),
                      #Note that "by" is now in a character vector.  
                      by = c("GroupingColumn1", sub("a", "b", attribute))]
}
Sign up to request clarification or add additional context in comments.

Comments

1

Consider reshaping your wide data to long format, usually the preferred method of most analytical methods (aggregating, plotting, modeling). With such an approach, you avoid complex looping. Plus, data.table has reshaping methods including melt and dcast.

melt_dt <- melt(fileIWantToAnalyze, 
                id.vars = c("GroupingColumn1"), 
                measure.vars = list(paste0("variable_", 1:9, "a"),
                                    paste0("variable_", 1:9, "b"))
                value.name = c("value_a", "value_b")
               )

agg_dt <- melt_dt[, .(mean_value=(value_a), count=.N), 
                  by=list(GroupingColumn1, value_b)][order(GroupingColumn1, value_b)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.