Sample data. I'm not sure how to use the code block system on SO yet.
df <- data.frame(c(1,1,1,2,2,2,3,3,3),c(1990,1991,1992,1990,1991,1992,1990,1991,1992),c(1,2,3,3,2,1,2,1,3))
colnames(df) <- c("id", "year", "value")
That generates a simple matrix.
id year value
1 1990 1
1 1991 2
1 1992 3
2 1990 3
2 1991 2
2 1992 1
3 1990 2
3 1991 1
3 1992 3
I was sorting through the R subsetting questions, and couldn't figure out the second step in a ddply function {plyr} applied to it.
Logic: For all ID subgroups, find the highest value (which is 3) at the earliest time point.
I'm confused as to what syntax to use here. From searching SO, I think ddply is the best choice, but can't figure out how. Ideally, my output should be a vector of UNIQUE IDs (as only one is selected, with the entire row taken with it. This isn't working in R for me, but its the best "logic" I could come up with.
ddply( (ddply(df,id)), year, which.min(value) )
E.g.
id year value
1 1992 3
2 1990 3
3 1992 3
If 3 is not available, the next highest (2, or 1) should be taken. Any ideas?