What I need:
I have a huge data frame with the following columns (and some more, but these are not important). Here's an example:
user_id video_id group_id x y
1 1 0 0 39 108
2 1 0 0 39 108
3 1 10 0 135 180
4 2 0 0 20 123
User, video and group IDs are factors, of course. For example, there are 20 videos, but each of them has several "observations" for each user and group.
I'd like to transform this data frame into the following format, where there are as many x.N, y.N as there are users (N).
video_id x.1 y.1 x.2 y.2 …
0 39 108 20 123
So, for video 0, the x and y values from user 1 are in columns x.1 and y.1, respectively. For user 2, their values are in columns x.2, y.2, and so on.
What I've tried:
I made myself a list of data frames that are solely composed of all the x, y observations for each video_id:
summaryList = dlply(allData, .(user_id), function(x) unique(x[c("video_id","x","y")]) )
That's how it looks like:
List of 15
$ 1 :'data.frame': 20 obs. of 3 variables:
..$ video_id: Factor w/ 20 levels "0","1","2","3",..: 1 11 8 5 12 9 20 13 7 10 ...
..$ x : int [1:20] 39 135 86 122 28 167 203 433 549 490 ...
..$ y : int [1:20] 108 180 164 103 187 128 185 355 360 368 ...
$ 2 :'data.frame': 20 obs. of 3 variables:
..$ video_id: Factor w/ 20 levels "0","1","2","3",..: 2 14 15 4 20 6 19 3 13 18 ...
..$ x : int [1:20] 128 688 435 218 528 362 299 134 83 417 ...
..$ y : int [1:20] 165 117 135 179 96 328 332 563 623 476 ...
Where I'm stuck:
What's left to do is:
Merge each data frame from thesummaryListwith each other, based on thevideo_id. I can't find a nice way to access the actual data frames in the list, which aresummaryList[1]$`1`,summaryList[2]$`2`, et cetera.@James found out a partial solution:
Reduce(function(x,y) merge(x,y,by="video_id"),summaryList)Ensure the column names are renamed after the user ID and not kept as-is. Right now my
summaryListdoesn't contain any info about the user ID, and the output ofReducehas duplicate column names likex.x y.x x.y y.y x.x y.xand so on.
How do I go about doing this? Or is there any easier way to get to the result than what I'm currently doing?
Reduce(function(x,y) merge(x,y,by="video_id"),summaryList)might do the trickvideo_id x.x y.x x.y y.y x.x y.x, etc., so not named after the user IDs.names(reducedData)[-1] <- do.call(function(...) paste(...,sep="."),expand.grid(letters[24:25],names(summaryList)))afterwards.