0

Here I am trying to use the by parameter in the data.table to rank the prediction column within each group. I haven't been able to understand why the following piece of code isn't working:

> x.small
       prediction group
 1: -0.0093753015    up
 2:  0.0204832283  down
 3: -0.0091790179  down
 4: -0.0473988803  down
 5:  0.0144955868  down
 6: -0.0139455871  down
 7:  0.0005746896    up
 8: -0.0174406693  down
 9: -0.0180556244  down
10: -0.0343069464    up
> x.small[, rank(prediction), by=group]
Error in rank(prediction) :
  'names' attribute [7] must be the same length as the vector [3]

But this example code works fine:

> diamonds.dt <- data.table(diamonds[1:10, c('carat', 'color')])
> diamonds.dt
    carat color
 1:  0.23     E
 2:  0.21     E
 3:  0.23     E
 4:  0.29     I
 5:  0.31     J
 6:  0.24     J
 7:  0.24     I
 8:  0.26     H
 9:  0.22     E
10:  0.23     H
> diamonds.dt[, rank(carat), by=color]
    color  V1
 1:     E 3.5
 2:     E 1.0
 3:     E 3.5
 4:     E 2.0
 5:     I 2.0
 6:     I 1.0
 7:     J 2.0
 8:     J 1.0
 9:     H 2.0
10:     H 1.0

Any help would be much appreciated!

EDIT:

Okay now I really have no idea what's going on, this is very bizarre. I tried making a reproducible example for @Ananda but could not recreate the error. I even tried running the ranking logic on an exact copy of the prediction column and it worked fine:

> x.small[, prediction.copy:=prediction]
> x.small[, rank(prediction.copy), by=group]
    group V1
 1:    up  2
 2:    up  3
 3:    up  1
 4:  down  7
 5:  down  5
 6:  down  1
 7:  down  6
 8:  down  4
 9:  down  3
10:  down  2
> x.small[, rank(prediction), by=group]
Error in rank(prediction) :
  'names' attribute [7] must be the same length as the vector [3]

How could there be two different results from two identical columns?

EDIT 2:

Output of dput(x.small):

> dput(x.small)
structure(list(prediction = structure(c(-0.00937530151309606,
0.0204832283018108, -0.00917901792827827, -0.0473988802836657,
0.0144955868466372, -0.0139455871394683, 0.000574689607249577,
-0.0174406692627376, -0.0180556244204637, -0.0343069463869563
), .Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"
)), group = c("up", "down", "down", "down", "down", "down", "up",
"down", "down", "up"), prediction.copy = c(-0.00937530151309606,
0.0204832283018108, -0.00917901792827827, -0.0473988802836657,
0.0144955868466372, -0.0139455871394683, 0.000574689607249577,
-0.0174406692627376, -0.0180556244204637, -0.0343069463869563
)), .Names = c("prediction", "group", "prediction.copy"), row.names = c(NA,
-10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x22f2af8>)
8
  • 1
    Can you post some reproducible code (or rather, code that reproduces the error you are mentioning)? If I copy and paste your code it works fine for me.... Commented Feb 19, 2014 at 16:30
  • @Ananda Just posted an update, this is very strange Commented Feb 19, 2014 at 16:54
  • Can you edit your question to include dput(x.small). Commented Feb 19, 2014 at 16:56
  • The two columns aren't identical: the x.small$prediction vector is a named vector (you can see it in the dput output, or with names(x.small$prediction)). Why does your column have names? Remove them and it solves your problem Commented Feb 19, 2014 at 17:09
  • Even this works for me.... What version of "data.table" are you using? Commented Feb 19, 2014 at 17:27

1 Answer 1

1

I guess I'll just close this one. If you are having this same issue, check if the problem column is a named vector by running str(x.small) and seeing if the vector starts with the word "Named". For some reason using the by parameter when operating on a named vector is causing issues. This appears to be a minor bug in earlier versions of data.table that was patched in later versions. To fix it, upgrade data.table or just use unname() as @Frank suggested:

x.small[,rank(unname(prediction)), by=group]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.