0

novice R user here...I'm trying to compare dates for each id and determine which entry is earlier or later. The input data would look something like this:

id    date
101   18-Sep-12
101   21-Aug-12
102   25-Mar-13
102   15-Apr-13

And the output would look something like this:

id    date         Category
101   18-Sep-12    Late
101   21-Aug-12    Early
102   25-Mar-13    Early
102   15-Apr-13    Late

-Justin

2
  • 1
    Are there always two entries for each id? Commented Oct 21, 2013 at 16:15
  • Yes, Just two for this example Commented Oct 21, 2013 at 16:16

3 Answers 3

2

If your data frame is df:

df$date <- as.Date(df$date, format="%d-%b-%y")
df = df[order(df$id, df$date),]
df$Category = c("Early", "Late")
Sign up to request clarification or add additional context in comments.

Comments

2

You can use plyr here :

library(plyr)
loc <- Sys.setlocale("LC_TIME", "ENGLISH")
dat$date <- as.Date(dat$date, format = "%d-%b-%y")
ddply(dat, .(id), transform, cat = ifelse(date == min(date), "EARLY", "LATE"))
##    id       date   cat
## 1 101 2012-09-18  LATE
## 2 101 2012-08-21 EARLY
## 3 102 2013-03-25 EARLY
## 4 102 2013-04-15  LATE
Sys.setlocale("LC_TIME", loc)

2 Comments

ok, this looks great. What is the relevance of setting the local time?
@user2900006 I have to set the local time since the %b format is local dependant. I have a french locals, I think no need in your case. I just put it for others having same locals like me...
0

I would probably look into using the "data.table" package.

The general approach I would use is to use order or rank to create your "category" column. The thing that's nice here is that you are not really limited by comparing two dates.

DT <- data.table(df)
DT[, category := order(date), by = id]
DT
#     id       date category
# 1: 101 2012-09-18        2
# 2: 101 2012-08-21        1
# 3: 102 2013-03-25        1
# 4: 102 2013-04-15        2

If you wanted text labels, you can use factor:

DT[, category := factor(category, labels = c("Early", "Late"))]
DT
#     id       date category
# 1: 101 2012-09-18     Late
# 2: 101 2012-08-21    Early
# 3: 102 2013-03-25    Early
# 4: 102 2013-04-15     Late

For convenience, this is the "df" that I started with:

df <- structure(list(id = c(101L, 101L, 102L, 102L), 
    date = structure(c(15601, 15573, 15789, 15810), class = "Date")), 
    .Names = c("id", "date"), row.names = c(NA, -4L), class = "data.frame")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.