2

I have a dataframe that looks like:

id<-c(1,1,1,3,3)
date1<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
type<-c("A","B","A","B","B")
df<-data.frame(id,date,type)
df$date<-as.Date(as.character(df$date), format = "%d-%m-%y")

What I want is to add a new column that contains the earliest date for each ID for each type. This first attempt works fine and does the aggregate and merging based on only the ID.

d = aggregate(df$date, by=list(df$id), min)
df2 = merge(df, d, by.x="id", by.y="Group.1")

What I want though is to also filter by type and get this result:

data.frame(df2, desired=c("2007-11-30","2007-11-01", "2007-11-30","2007-12-17","2007-12-17"))

I've tried a lot of possibilities. I really think it can be done with lists but I'm at a loss to how...

d = aggregate(df$date, by=list(df$id, df$type), min)

# And merge the result of aggregate with the original data frame
df2 = merge(df,d,by.x=list("id","type"),by.y=list("Group.1","Group.2"))

For this simple example I could just separate the types into their own df, build the new column and then combine the resulting 2 dfs but in reality there's many types and a 3rd column that also has to be filtered similarly which would not be practical...

Thank you!

2
  • You have a typo mismatch between date1 and date when making df Commented Jan 10, 2017 at 2:58
  • @thelatemail You're right. I went in a circle to make that date column... Commented Jan 10, 2017 at 3:40

2 Answers 2

2

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', 'type' (or with 'id'), order the 'date' and assign (:=) the first element of 'date' as the 'earliestdate' column.

library(data.table)
setDT(df)[order(date), earliestdateid := date[1], by = id
    ][order(date), earliestdateidtype := date[1], by = .(id, type)]
df
#    id       date type earliestdateid earliestdateidtype
#1:  1 2008-01-23    A     2007-11-01         2007-11-30
#2:  1 2007-11-01    B     2007-11-01         2007-11-01
#3:  1 2007-11-30    A     2007-11-01         2007-11-30
#4:  3 2007-12-17    B     2007-12-17         2007-12-17
#5:  3 2008-12-12    B     2007-12-17         2007-12-17

A similar approach with dplyr is

library(dplyr)
df %>%
   group_by(id) %>%
   arrange(date) %>%
   mutate(earliestdateid = first(date)) %>%
   group_by(type, add = TRUE) %>%
   mutate(earliestdateidtype = first(date))

NOTE: This avoid doing this in two steps i.e. get a summarised output and then join

Sign up to request clarification or add additional context in comments.

2 Comments

Wow this is why I like R. Complicated bunch of operations taken care of in 1 line. And I thought 2 lines was great hah. If I run into something similar but on a numerical column instead of date, would i just change order(date) to mean(numbers) or something to that effect for the data.table way?
@Soran If you just want the mean(numbers), then order is not needed, i.e. setDT(df)[, Mean := mean(numbers), .(id, type)]
2

You could get the two minimums by different groups using ave instead:

df$minid <- with(df, ave(date, id, FUN=min, drop=TRUE) )
df$minidtype <- with(df, ave(date, list(id,type), FUN=min, drop=TRUE) )
df

#  id       date type      minid  minidtype
#1  1 2008-01-23    A 2007-11-01 2007-11-30
#2  1 2007-11-01    B 2007-11-01 2007-11-01
#3  1 2007-11-30    A 2007-11-01 2007-11-30
#4  3 2007-12-17    B 2007-12-17 2007-12-17
#5  3 2008-12-12    B 2007-12-17 2007-12-17

If you were tricky you could do it all in one call too:

df[c("minid", "minidtype")] <- lapply(list("id", c("id","type")),
                                  FUN=function(x) ave(df$date, df[x], FUN=min, drop=TRUE) )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.