2

I have a duplicate date which I want to remove based on the value of another variable. If one of dmean values for corresponding duplicates dates is NA I want to drop it. If both dmean values for a certain date are NA I would like to keep any of the date. Sample data is found below: I have tried

subset(df1, !duplicated(date)) 

but this removed all duplicates regardless of the value of dmean. For example for date 2010-12-23 I would like to keep the the dmean value 28.38250 instead of the one with NA.

structure(list(date = c("2010-12-22", "2010-12-22", "2010-12-23", 
"2010-12-23", "2010-12-24", "2010-12-24", "2010-12-25", "2010-12-25", 
"2010-12-26", "2010-12-26", "2010-12-27", "2010-12-27", "2010-12-28", 
"2010-12-28"), dmean = c(NA, NA, NA, 28.3825, 35.54625, NA, 75.27625, 
NA, NA, 75.225, NA, 41.75, NA, 37.98375)), .Names = c("date", 
"dmean"), class = "data.frame", row.names = c(NA, -14L))

2 Answers 2

1

Here is a solution with plyr :

ddply(df, .(date), summarize,
      dmean=ifelse(all(is.na(dmean)), NA, max(dmean,na.rm=TRUE)))

Which gives :

        date    dmean
1 2010-12-22       NA
2 2010-12-23 28.38250
3 2010-12-24 35.54625
4 2010-12-25 75.27625
5 2010-12-26 75.22500
6 2010-12-27 41.75000
7 2010-12-28 37.98375

Note that it is really easy to change the function call if you want the mean, the min or any other statistics of your dmean values.

You can do the same with data.table, too :

dt <- data.table(df)
dt[,list(dmean=ifelse(all(is.na(dmean)), NA_real_, max(dmean,na.rm=TRUE))),by=date]
Sign up to request clarification or add additional context in comments.

Comments

1

It will work if you order the data frame by date and dmean first:

df1_sorted <- df1[order(df1$date, df1$dmean), ]

After the reordering, the NAs in dmeans are below the numeric values for each corresponding date.

Now, you can exclude the rows with duplicated dates:

subset(df1_sorted, !duplicated(date))

The result:

         date    dmean
1  2010-12-22       NA
4  2010-12-23 28.38250
5  2010-12-24 35.54625
7  2010-12-25 75.27625
10 2010-12-26 75.22500
12 2010-12-27 41.75000
14 2010-12-28 37.98375

3 Comments

Beware, that if no copy of a date is NA one of them will still be dropped by this solution. Can that happen, @Meso?
@Backlin You are right. I suppose the data contain one or no dmean value. This is the case in the example.
@Backlin, in my data one date is always NA. But there are also occasions in which both dates are NA.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.