R aggregate based on multiple columns and then merge into dataframe?

Question

I have a dataframe that looks like:

id<-c(1,1,1,3,3)
date1<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
type<-c("A","B","A","B","B")
df<-data.frame(id,date,type)
df$date<-as.Date(as.character(df$date), format = "%d-%m-%y")

What I want is to add a new column that contains the earliest date for each ID for each type. This first attempt works fine and does the aggregate and merging based on only the ID.

d = aggregate(df$date, by=list(df$id), min)
df2 = merge(df, d, by.x="id", by.y="Group.1")

What I want though is to also filter by type and get this result:

data.frame(df2, desired=c("2007-11-30","2007-11-01", "2007-11-30","2007-12-17","2007-12-17"))

I've tried a lot of possibilities. I really think it can be done with lists but I'm at a loss to how...

d = aggregate(df$date, by=list(df$id, df$type), min)

# And merge the result of aggregate with the original data frame
df2 = merge(df,d,by.x=list("id","type"),by.y=list("Group.1","Group.2"))

For this simple example I could just separate the types into their own df, build the new column and then combine the resulting 2 dfs but in reality there's many types and a 3rd column that also has to be filtered similarly which would not be practical...

Thank you!

You have a typo mismatch between date1 and date when making df — thelatemail
– thelatemail, Commented Jan 10, 2017 at 2:58
@thelatemail You're right. I went in a circle to make that date column... — Soran
– Soran, Commented Jan 10, 2017 at 3:40

akrun · Accepted Answer · 2017-01-10 03:11:10Z

2

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', 'type' (or with 'id'), order the 'date' and assign (:=) the first element of 'date' as the 'earliestdate' column.

library(data.table)
setDT(df)[order(date), earliestdateid := date[1], by = id
    ][order(date), earliestdateidtype := date[1], by = .(id, type)]
df
#    id       date type earliestdateid earliestdateidtype
#1:  1 2008-01-23    A     2007-11-01         2007-11-30
#2:  1 2007-11-01    B     2007-11-01         2007-11-01
#3:  1 2007-11-30    A     2007-11-01         2007-11-30
#4:  3 2007-12-17    B     2007-12-17         2007-12-17
#5:  3 2008-12-12    B     2007-12-17         2007-12-17

A similar approach with dplyr is

library(dplyr)
df %>%
   group_by(id) %>%
   arrange(date) %>%
   mutate(earliestdateid = first(date)) %>%
   group_by(type, add = TRUE) %>%
   mutate(earliestdateidtype = first(date))

NOTE: This avoid doing this in two steps i.e. get a summarised output and then join

edited Jan 10, 2017 at 3:11

answered Jan 10, 2017 at 2:59

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Soran Over a year ago

Wow this is why I like R. Complicated bunch of operations taken care of in 1 line. And I thought 2 lines was great hah. If I run into something similar but on a numerical column instead of date, would i just change order(date) to mean(numbers) or something to that effect for the data.table way?

akrun Over a year ago

@Soran If you just want the mean(numbers), then order is not needed, i.e. setDT(df)[, Mean := mean(numbers), .(id, type)]

thelatemail · Accepted Answer · 2017-01-10 03:42:58Z

2

You could get the two minimums by different groups using ave instead:

df$minid <- with(df, ave(date, id, FUN=min, drop=TRUE) )
df$minidtype <- with(df, ave(date, list(id,type), FUN=min, drop=TRUE) )
df

#  id       date type      minid  minidtype
#1  1 2008-01-23    A 2007-11-01 2007-11-30
#2  1 2007-11-01    B 2007-11-01 2007-11-01
#3  1 2007-11-30    A 2007-11-01 2007-11-30
#4  3 2007-12-17    B 2007-12-17 2007-12-17
#5  3 2008-12-12    B 2007-12-17 2007-12-17

If you were tricky you could do it all in one call too:

df[c("minid", "minidtype")] <- lapply(list("id", c("id","type")),
                                  FUN=function(x) ave(df$date, df[x], FUN=min, drop=TRUE) )

edited Jan 10, 2017 at 3:42

answered Jan 10, 2017 at 3:04

thelatemail

94.3k12 gold badges140 silver badges197 bronze badges

Collectives™ on Stack Overflow

R aggregate based on multiple columns and then merge into dataframe?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related