0

When trying to select rows in a data.table (package for R) by specifying the value of a field consisting of large integers, I get strange results. Namely, similar integers are selected too.

require(data.table)
options(digits=15)
data <- data.table(A=c(1000200030001,1000200030002,1000200030003))

Try to access the first row by checking the value of A:

data[A==1000200030001]
               A
1: 1000200030001
2: 1000200030002
3: 1000200030003

All three rows are shown, where I expect only the first to be returned.

Problem solved when using as.numeric:

data[as.numeric(A)==1000200030001]
               A
1: 1000200030001

Problem not present in jpart of data.table:

data[,A == 1000200030001]
[1]  TRUE FALSE FALSE

This seems to be a problem with the precision of comparing large numbers. I am very confused that using as.numeric solves the issue since str(data) shows that A already is of type numeric:

str(data)
Classes ‘data.table’ and 'data.frame':  3 obs. of  1 variable:
 $ A: num  1e+12 1e+12 1e+12
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "index")= atomic  
  ..- attr(*, "A")= int 

Any hints as how to ensure this problem does not appear in (productive) code are appreciated!

UPDATE: The problem described above is not present when disabling auto-indexing.

options(datatable.auto.index=FALSE)

However, problems with aggregation and merging/joining are not solved by disabling auto-indexing:

data[,.(B=sum(A)),A]
               A             B
1: 1000200030001 1000200030001

Where the correct output would be:

               A             B
1: 1000200030001 1000200030001
2: 1000200030002 1000200030002
3: 1000200030003 1000200030003

I found the best solution to all of these problems to use the bit64 package as described in the selected answer. Thanks everybody!

5
  • You just have to put your condition into brackets when the i argument is logical. Try data[(A==1000200030001)]. The as.numeric part has no role, since A is already numeric. Commented Dec 15, 2015 at 10:04
  • +1 nicola. data[,A == 1000200030001] is logical, so if you subset data with this condition then it also works: data[data[,A==1000200030001]] Commented Dec 15, 2015 at 10:10
  • 1
    This might be worth a bug report. It's a problem with auto-indexing. Commented Dec 15, 2015 at 10:12
  • Regarding your edit: Use a big integer data type. That should solve all these related problems. Commented Dec 15, 2015 at 10:57
  • Will do. Thanks for your useful comments and answers! Commented Dec 16, 2015 at 8:20

1 Answer 1

4

Use bit64::integer64:

require(data.table)
options(digits=15)
library(bit64)
data <- fread("A
              1000200030001
              1000200030002
              1000200030003", colClasses = "integer64")


data[A == as.integer64("1000200030001")]
#A
#1: 1000200030001   

Alternatively, deactivate auto-indexing (and lose the performance advantage from it):

options(datatable.auto.index=FALSE)
data <- data.table(A=c(1000200030001,1000200030002,1000200030003))
data[(A==1000200030001)]
#               A
#1: 1000200030001
Sign up to request clarification or add additional context in comments.

5 Comments

While the use of integer64 may very likely be useful for the OP's purpose, I don't think that here is the issue. I guess it's the known feature of data.table of handling the i argument. For the values provided in the OP, the comparison is always correct also using base operators (like ==). Of course, these operators could be wrong for larger values.
@nicola: I call this a bug.
You are right, I didn't want to be so drastic... Anyway, once you deactivated the auto.index, you can remove the bracket and data[A==1000200030001] works (the version with the bracket used to work with the auto.index set to TRUE).
@nicola Parentheses make sure that i is evaluated, which is another way of avoiding auto-indexing.
It's not auto index that's the issue, but numeric rounding. See ?setNumericRounding. Auto indexing is not used when expr is wrapped with () at the moment. Using bit64 is the way to go.. A better warning will be issued for next release.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.