I'm having difficulty using data.table operations to correctly manipulate my data. The goal is to, by group create a number of rows for the group based on the value of two date columns. I'm changing my data here in order to protect it, but it gets the idea across
head(my_data_table, 6)
team_name play_name first_detected last_detected PlayID
1: Baltimore Play Action 2016 2017 41955-58
2: Washington Four Verticals 2018 2020 54525-52
3: Dallas O1 Trap 2019 2019 44795-17
4: Dallas Play Action 2020 2020 41955-58
5: Dallas Power Zone 2020 2020 54782-29
6: Dallas Bubble Screen 2018 2018 52923-70
The goal is to turn it into this
team_name play_name year PlayID
1: Baltimore Play Action 2016 41955-58
2: Baltimore Play Action 2017 41955-58
3: Washington Four Verticals 2018 54525-52
4: Washington Four Verticals 2019 54525-52
5: Washington Four Verticals 2020 54525-52
6: Dallas O1 Trap 2019 44795-17
...
n: Dallas Bubble Screen 2018 52923-70
My code I attempt to employ for this purpose is the following
my_data_table[,.(PlayID, year = seq(first_detected,last_detected,by=1)), by = .(team_name, play_name)]
When I run this code, I get:
Error in seq.default(first_detected_ever, last_detected_ever, by = 1) :
'from' must be of length 1
Two other attempts also failed
my_data_table[,.(PlayID, year = seq(min(first_detected),max(last_detected),by=1)), by = .(team_name, play_name)]
my_data_table[,.(PlayID, year = list(seq(min(first_detected),max(last_detected),by=1))), by = .(team_name, play_name)]
which both result in something that looks like
by year PlayID
1: Baltimore Washington Dallas Play Action 2011, 2012, 2013, 2014, 2015, 2016 ... 41955-58
...
In as.data.table.list(jval, .named = NULL) :
Item 3 has 2 rows but longest item has 38530489; recycled with remainder.
I haven't found any clear answers on why this is happening. It seems like, when passing the "first detected' and "last detected", that it's interpreting it somehow as the entire range of the column's values, despite me passing the by = .(team_name,play_name), which always results in one distinct row, which I have verified. Going by the "by" grouping here should only have one value of first_detected and last_detected. I've done something similar before, but the difference was that I wasn't doing it with a "by = .(x,y,z,...)" grouping, and applied the operation on each row. Could anyone help me understand why I am unable to get the desired output with this data.table method?
dput(head(my_data_table))