select specific lines with the data.table package

Question

I have the following (simplified) dataset:

df <- data.frame(a=c("A","A","B","B","B"),x=c(1,2,3,3,4))
df
  a x
1 A 1
2 A 2
3 B 3
4 B 3
5 B 4

Since I'm working with large datasets, I use the data.table package.

Is there a way to get those lines in df, where x is minimal grouped by a. So in this case, I want to select lines 1,3 and 4.

Something like

df[,min(x),by=a]

But that doesn't give me the lines I wanna have, it just Shows me the minmum values for x grouped by a.

Any suggestions?

Roland · Accepted Answer · 2013-08-09 08:29:32Z

6

library(data.table)
dt <- data.table(a=c("A","A","B","B","B"), x=c(1,2,3,3,4))

These give only unique rows:

dt[, .SD[which.min(x)], by=a]

Or alternatively:

setkeyv(dt, c("a","x"))
dt[unique(dt[,a]), mult="first"]

Since you want to have all ties:

dt[,.SD[x==min(x)], by=a]

You could also use:

setkeyv(dt,c("a","x"))
dt[dt[unique(dt[,a]), mult="first"]]

Which could be more efficient if you have very big groups.

edited Aug 9, 2013 at 8:29

answered Aug 9, 2013 at 8:16

Roland

134k12 gold badges203 silver badges305 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

jenswirf Over a year ago

He also wants line 4!

Roland Over a year ago

Thanks. I added a solution for that.

beginneR Over a year ago

thanks, this is nice. But as you said, it's probably not the most efficient way, maybe it's possible to get rid of the "=="?

Roland Over a year ago

See the last solution. What is most efficient depends on the number and size of groups.

Roland Over a year ago

Well, create representative dummy data, so we can test solutions.

|

statquant · Accepted Answer · 2013-08-09 13:27:47Z

1

Here you go

R) dt <- data.table(a=c("A","A","B","B","B"),x=c(1,2,3,3,4))
R) dt[dt[,list(IDX=.I[x==min(x)]),by=a]$IDX]
   a x
1: A 1
2: B 3
3: B 3

That should be quicker if you want ties (as I understood you wanted)

edited Aug 9, 2013 at 13:27

answered Aug 9, 2013 at 13:21

statquant

14.6k23 gold badges98 silver badges172 bronze badges

Collectives™ on Stack Overflow

select specific lines with the data.table package

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related