1

I have the following (simplified) dataset:

df <- data.frame(a=c("A","A","B","B","B"),x=c(1,2,3,3,4))
df
  a x
1 A 1
2 A 2
3 B 3
4 B 3
5 B 4

Since I'm working with large datasets, I use the data.table package.

Is there a way to get those lines in df, where x is minimal grouped by a. So in this case, I want to select lines 1,3 and 4.

Something like

df[,min(x),by=a]

But that doesn't give me the lines I wanna have, it just Shows me the minmum values for x grouped by a.

Any suggestions?

2 Answers 2

6
library(data.table)
dt <- data.table(a=c("A","A","B","B","B"), x=c(1,2,3,3,4))

These give only unique rows:

dt[, .SD[which.min(x)], by=a]

Or alternatively:

setkeyv(dt, c("a","x"))
dt[unique(dt[,a]), mult="first"]

Since you want to have all ties:

dt[,.SD[x==min(x)], by=a]

You could also use:

setkeyv(dt,c("a","x"))
dt[dt[unique(dt[,a]), mult="first"]]

Which could be more efficient if you have very big groups.

Sign up to request clarification or add additional context in comments.

6 Comments

He also wants line 4!
Thanks. I added a solution for that.
thanks, this is nice. But as you said, it's probably not the most efficient way, maybe it's possible to get rid of the "=="?
See the last solution. What is most efficient depends on the number and size of groups.
Well, create representative dummy data, so we can test solutions.
|
1

Here you go

R) dt <- data.table(a=c("A","A","B","B","B"),x=c(1,2,3,3,4))
R) dt[dt[,list(IDX=.I[x==min(x)]),by=a]$IDX]
   a x
1: A 1
2: B 3
3: B 3

That should be quicker if you want ties (as I understood you wanted)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.