When trying to select rows in a data.table (package for R) by specifying the value of a field consisting of large integers, I get strange results. Namely, similar integers are selected too.
require(data.table)
options(digits=15)
data <- data.table(A=c(1000200030001,1000200030002,1000200030003))
Try to access the first row by checking the value of A:
data[A==1000200030001]
A
1: 1000200030001
2: 1000200030002
3: 1000200030003
All three rows are shown, where I expect only the first to be returned.
Problem solved when using as.numeric:
data[as.numeric(A)==1000200030001]
A
1: 1000200030001
Problem not present in jpart of data.table:
data[,A == 1000200030001]
[1] TRUE FALSE FALSE
This seems to be a problem with the precision of comparing large numbers. I am very confused that using as.numeric solves the issue since str(data) shows that A already is of type numeric:
str(data)
Classes ‘data.table’ and 'data.frame': 3 obs. of 1 variable:
$ A: num 1e+12 1e+12 1e+12
- attr(*, ".internal.selfref")=<externalptr>
- attr(*, "index")= atomic
..- attr(*, "A")= int
Any hints as how to ensure this problem does not appear in (productive) code are appreciated!
UPDATE: The problem described above is not present when disabling auto-indexing.
options(datatable.auto.index=FALSE)
However, problems with aggregation and merging/joining are not solved by disabling auto-indexing:
data[,.(B=sum(A)),A]
A B
1: 1000200030001 1000200030001
Where the correct output would be:
A B
1: 1000200030001 1000200030001
2: 1000200030002 1000200030002
3: 1000200030003 1000200030003
I found the best solution to all of these problems to use the bit64 package as described in the selected answer. Thanks everybody!
iargument islogical. Trydata[(A==1000200030001)]. Theas.numericpart has no role, sinceAis alreadynumeric.data[,A == 1000200030001]is logical, so if you subset data with this condition then it also works:data[data[,A==1000200030001]]