3

I have a data table for different patients ("Spell") and several temperature ("Temp") measures for each patient ("Episode"). I also have the date and time in which each temperature was taken.

Spell Episode         Date    Temp
 1       3       2-1-17 21:00   40
 1       2       2-1-17 20:00   36
 1       1       1-1-17 10:00   37
 2       3       2-1-17 15:00   36
 2       2       2-1-17 10:00   37
 2       1       1-1-17 8:00    36
 3       1       3-1-17 10:00   40
 4       3       4-1-17 15:00   36
 4       2       3-1-17 12:00   40
 4       1       3-1-17 10:00   39
 5       7       3-1-17 17:30   36
 5       6       2-1-17 17:00   36
 5       5       2-1-17 16:00   37
 5       1       1-1-17 9:00    36
 5       4       1-1-17 14:00   39
 5       3       1-1-17 13:00   40
 5       2       1-1-17 11:00   39

I am interested in keeping all the measurements done 24h prior to the last one, I have grouped the observations by the spell and reverse date, but I am unsure on how to do the in-group comparison using the same reference (in this case, the first row for each group). The result should be:

    Spell Episode         Date    Temp
 1       3       2-1-17 21:00   40
 1       2       2-1-17 20:00   36
 2       3       2-1-17 15:00   36
 2       2       2-1-17 10:00   37
 3       1       3-1-17 10:00   40
 4       3       4-1-17 15:00   36
 5       7       3-1-17 17:30   36

Would appreciate any ideas that point me to the right direction.

Edit: Date is in d-m-yy H:M format. Here's dput from data:

structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L, 
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L, 
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00", 
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00", 
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00", 
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00", 
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L, 
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode", 
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA, 
-17L), .internal.selfref = <pointer: 0x00000000001f0788>)
5
  • Reproducible example would be great for this one. Commented Jul 25, 2017 at 11:55
  • And what's the format of the date? Commented Jul 25, 2017 at 12:01
  • Thanks, date format is d-m-yy, and I edited to add the dput outcome Commented Jul 25, 2017 at 12:28
  • Your expected result shows an additional row (Spell 5, Episode 6) which is outside of the 24 hrs window. Is this intended? Commented Jul 25, 2017 at 17:13
  • @UweBlock, not at all - that's a mistake, editing it now. Thanks for pointing it out. Commented Jul 25, 2017 at 21:52

4 Answers 4

6
library(dplyr)

df %>% 
  mutate(Date2 = as.numeric(strptime(df$Date, "%d-%m-%Y %H:%M"))) %>% 
  group_by(Spell) %>% 
  filter(Date2 >= (max(Date2) - 60*60*24)) %>%
  select(-Date2)
Sign up to request clarification or add additional context in comments.

Comments

5

Solution using only data.table :

# convert Date column to POSIXct
DT[,Date:=as.POSIXct(Date,format='%d-%m-%y %H:%M',tz='GMT')]
# filter the data.table
filteredDT <- DT[, .SD[as.numeric(difftime(max(Date),Date,units='hours')) <= 24], by = Spell]

> filteredDT
   Spell Episode                Date Temp
1:     1       3 2017-01-02 21:00:00   40
2:     1       2 2017-01-02 20:00:00   36
3:     2       3 2017-01-02 15:00:00   36
4:     2       2 2017-01-02 10:00:00   37
5:     3       1 2017-01-03 10:00:00   40
6:     4       3 2017-01-04 15:00:00   36
7:     5       7 2017-01-03 17:30:00   36

Comments

2
mydata$Date <- as.POSIXct(mydata$Date, format = '%d-%m-%y %H:%M', tz='GMT')
mydata <- mydata[with(mydata, order(Spell, -as.numeric(Date))),]
index <- with(mydata, tapply(Date, Spell, function(x){x >= max(x) - as.difftime(1, unit="days")}))
mydata[unlist(index),]

    Spell Episode                Date Temp
1:      1       3 2017-01-02 21:00:00   40
2:      1       2 2017-01-02 20:00:00   36
4:      2       3 2017-01-02 15:00:00   36
5:      2       2 2017-01-02 10:00:00   37
7:      3       1 2017-01-03 10:00:00   40
8:      4       3 2017-01-04 15:00:00   36
11:     5       7 2017-01-03 17:30:00   36

2 Comments

Why don't you use the data provided by th OP? Instead, you are supplying your own data where the Date column is already converted to class POSIXct?
The data sample was added by OP in an edit after I answered the question. I will edit my answer accordingly.
1

The solution below uses two functions from Hadley Wickham's lubridate() package. This package is very handy when dealing with dates and times so I wonder why it hasn't been used in any of the other answers.

Furthermore, data.table is used because the OP has provided sample data of data.table class.

library(data.table)   # if not already loaded
# coerce Date to POSIXct
DT[, Date := lubridate::dmy_hm(Date)][
  # for each, pick measurements within last 24 hours
  , .SD[Date > max(Date) - lubridate::dhours(24L)], by = Spell][
    # order, just for convenience
    order(Spell, -Date)]
   Spell Episode                Date Temp
1:     1       3 2017-01-02 21:00:00   40
2:     1       2 2017-01-02 20:00:00   36
3:     2       3 2017-01-02 15:00:00   36
4:     2       2 2017-01-02 10:00:00   37
5:     3       1 2017-01-03 10:00:00   40
6:     4       3 2017-01-04 15:00:00   36
7:     5       7 2017-01-03 17:30:00   36

Please note that the expected result given by the OP shows an additional row (Spell 5, Episode 6) which is outside of the 24 hrs window.

Data

As provided by the OP

DT <- structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L, 
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L, 
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00", 
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00", 
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00", 
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00", 
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L, 
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode", 
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA, -17L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.