1

How to search in data.table by condition?

I have a table for, let's say, users which contains user ip + a table of ip ranges from http://dev.maxmind.com/geoip/legacy/csv/. I would like to get a country for each user. The problem is that in order to find a country you need to split the IP to tokens and apply some formula to get an integer value representing the address. Then you need to find a row where that value fits in range.

I managed to do that with data.frame's but it is very slow..

extract.country <- function(code) {
  geo[(geo['V3']<=code & code<=geo['V4'])][6]
}

There are 93100 ip addresses ranges and relatively same number of users. But it seems to take about

   user  system elapsed 
   17.109   1.144  20.649 

to process 100 users on my machine. Which will be about 5 hours to process all of them.

That is how my & geoIP datasets are look like:

> head(dat)
                   V2            V26
1 2014-03-01 14:06:59  86.183.184.19
2 2014-03-01 23:50:02  86.112.53.139
3 2014-03-01 15:07:02    5.69.149.65
4 2014-03-01 14:27:18 218.186.19.230
5 2014-03-01 13:08:31   86.0.151.153
6 2014-03-01 23:18:00    79.148.42.6
> 
> head(geo)
        V1         V2       V3       V4 V5        V6
1  1.0.0.0  1.0.0.255 16777216 16777471 AU Australia
2  1.0.1.0  1.0.3.255 16777472 16778239 CN     China
3  1.0.4.0  1.0.7.255 16778240 16779263 AU Australia
4  1.0.8.0 1.0.15.255 16779264 16781311 CN     China
5 1.0.16.0 1.0.31.255 16781312 16785407 JP     Japan
6 1.0.32.0 1.0.63.255 16785408 16793599 CN     China

Where for each IP I calculate an int value, like

[1] 1454880787 1450194315   88446273 3669627878 1442879385 1335110150

I wanted to improve the performance with a data.table as columns might be indexed and performance is really good for exact search. But I can't figure out how to make a conditional search with that.

Or, is there any other way to do that with R?

3
  • It's readable in no more than 15 mins and gives a good start : link Commented Jun 20, 2014 at 11:50
  • @ColonelBeauvel thank you. I've read that. Unfortunately, there is no information about conditional search. Most examples are searches like DT[J("R","h") which is equal to DF[DF$x=="R" & DF$y=="h",] Commented Jun 20, 2014 at 12:49
  • 5
    geo[V3 <= code & code <= V4, V6]? Commented Jun 20, 2014 at 14:38

2 Answers 2

1

This should be what you're looking for:

dt<-data.table(num=c(1,2,3,1,3,4,6,7,7,7,8,3), let=(rep(c("A","B","C"), each=4)))

then:

dt[num > 1 & let == "B"]
Sign up to request clarification or add additional context in comments.

Comments

0

I'm new to R, so could be wrong. It seems that rows are sorted by values in V3 column (if not - sort them), so you can use fast binary search.

extract.country <- function(code) {
    geo[[findInterval(code,geo$V3),"V6"]]
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.