I would like to loop through the following data.frame and group by sequential entries, as determined by the value in X2. So in the following data.frame, we can see four groups: 1-3, 5-6, 9-13, and 16. We could have any combination of group sizes and number of groups.
X1 X2 X3 X4
1 1_21/08/2014 22:56CONTENT_ACCESS.preparing 1 21/08/2014 22:56 CONTENT_ACCESS.preparing
2 2_21/08/2014 22:57CONTENT_ACCESS.preparing 2 21/08/2014 22:57 CONTENT_ACCESS.preparing
3 3_21/08/2014 22:58CONTENT_ACCESS.preparing 3 21/08/2014 22:58 CONTENT_ACCESS.preparing
4 5_21/08/2014 23:07CONTENT_ACCESS.preparing 5 21/08/2014 23:07 CONTENT_ACCESS.preparing
5 6_21/08/2014 23:08CONTENT_ACCESS.preparing 6 21/08/2014 23:08 CONTENT_ACCESS.preparing
6 9_21/08/2014 23:29CONTENT_ACCESS.preparing 9 21/08/2014 23:29 CONTENT_ACCESS.preparing
7 10_21/08/2014 23:30CONTENT_ACCESS.preparing 10 21/08/2014 23:30 CONTENT_ACCESS.preparing
8 11_21/08/2014 23:31CONTENT_ACCESS.preparing 11 21/08/2014 23:31 CONTENT_ACCESS.preparing
9 12_21/08/2014 23:33CONTENT_ACCESS.preparing 12 21/08/2014 23:33 CONTENT_ACCESS.preparing
10 13_21/08/2014 23:34CONTENT_ACCESS.preparing 13 21/08/2014 23:34 CONTENT_ACCESS.preparing
11 16_21/08/2014 23:40CONTENT_ACCESS.preparing 16 21/08/2014 23:40 CONTENT_ACCESS.preparing
I would like to capture the timestamps in X3 so they can describe the time range (i.e. the first and last timestamp of each group) and produce this output. start_ts is the first timestamp and stop_ts is the last in each group:
student_id session_id start_ts stop_ts week micro_process
1 4 16 21/08/2014 22:56 21/08/2014 22:58 4 TASK
2 4 16 21/08/2014 23:07 21/08/2014 23:08 4 TASK
3 4 16 21/08/2014 23:29 21/08/2014 23:34 4 TASK
3 4 16 21/08/2014 23:40 21/08/2014 23:40 4 TASK
I haven't yet attempted the loop but would like to see how to do it without traditional looping. My code currently only captures the range of the whole group:
student_id session_id start_ts stop_ts week micro_process
1 4 16 21/08/2014 22:58 21/08/2014 23:30 4 TASK
The other variables (student ID etc.) have been dummified in my example and are not strictly relevant but I would like to leave them in for completeness.
Code (which can be run directly):
library(stringr)
options(stringsAsFactors = FALSE)
eventised_session <- data.frame(student_id=integer(),
session_id=integer(),
start_ts=character(),
stop_ts=character(),
week=integer(),
micro_process=character())
string_match.df <- structure(list(X1 = c("1_21/08/2014 22:56CONTENT_ACCESS.preparing",
"2_21/08/2014 22:57CONTENT_ACCESS.preparing", "3_21/08/2014 22:58CONTENT_ACCESS.preparing",
"5_21/08/2014 23:07CONTENT_ACCESS.preparing", "6_21/08/2014 23:08CONTENT_ACCESS.preparing",
"9_21/08/2014 23:29CONTENT_ACCESS.preparing", "10_21/08/2014 23:30CONTENT_ACCESS.preparing",
"11_21/08/2014 23:31CONTENT_ACCESS.preparing", "12_21/08/2014 23:33CONTENT_ACCESS.preparing",
"13_21/08/2014 23:34CONTENT_ACCESS.preparing", "16_21/08/2014 23:40CONTENT_ACCESS.preparing"
), X2 = c("1", "2", "3", "5", "6", "9", "10", "11", "12", "13",
"16"), X3 = c("21/08/2014 22:56", "21/08/2014 22:57", "21/08/2014 22:58",
"21/08/2014 23:07", "21/08/2014 23:08", "21/08/2014 23:29", "21/08/2014 23:30",
"21/08/2014 23:31", "21/08/2014 23:33", "21/08/2014 23:34", "21/08/2014 23:40"
), X4 = c("CONTENT_ACCESS.preparing", "CONTENT_ACCESS.preparing",
"CONTENT_ACCESS.preparing", "CONTENT_ACCESS.preparing", "CONTENT_ACCESS.preparing",
"CONTENT_ACCESS.preparing", "CONTENT_ACCESS.preparing", "CONTENT_ACCESS.preparing",
"CONTENT_ACCESS.preparing", "CONTENT_ACCESS.preparing", "CONTENT_ACCESS.preparing"
)), .Names = c("X1", "X2", "X3", "X4"), row.names = c(NA, -11L
), class = "data.frame")
r_student_id <- 4
r_session_id <- 16
r_week <- 4
r_mic_proc <- "TASK"
string_match.df
#Get the first and last timestamp in matched sequence
r_start_ts <- string_match.df[1, ncol(string_match.df)-1]
r_stop_ts <- string_match.df[nrow(string_match.df), ncol(string_match.df)-1]
eventised_session[nrow(eventised_session)+1,] <- c(r_student_id, r_session_id, r_start_ts, r_stop_ts, r_week, r_mic_proc)
eventised_session
I would appreciate you expertise on this one. I have only ever used traditional loops.