Adding missing panel dates by group as rows using data.table

Question

I'm having difficulty using data.table operations to correctly manipulate my data. The goal is to, by group create a number of rows for the group based on the value of two date columns. I'm changing my data here in order to protect it, but it gets the idea across

head(my_data_table, 6)

     team_name         play_name       first_detected  last_detected PlayID 
1:   Baltimore         Play Action     2016            2017          41955-58
2:   Washington        Four Verticals  2018            2020          54525-52
3:   Dallas            O1 Trap         2019            2019          44795-17
4:   Dallas            Play Action     2020            2020          41955-58
5:   Dallas            Power Zone      2020            2020          54782-29
6:   Dallas            Bubble Screen   2018            2018          52923-70

The goal is to turn it into this

     team_name            play_name      year      PlayID
1:   Baltimore         Play Action       2016       41955-58 
2:   Baltimore         Play Action       2017       41955-58 
3:   Washington      Four Verticals      2018       54525-52
4:   Washington      Four Verticals      2019       54525-52
5:   Washington      Four Verticals      2020       54525-52 
6:   Dallas               O1 Trap        2019       44795-17 
...  
n:   Dallas           Bubble Screen      2018       52923-70

My code I attempt to employ for this purpose is the following

my_data_table[,.(PlayID, year = seq(first_detected,last_detected,by=1)), by = .(team_name, play_name)]

When I run this code, I get:

Error in seq.default(first_detected_ever, last_detected_ever, by = 1) : 
  'from' must be of length 1

Two other attempts also failed

my_data_table[,.(PlayID, year = seq(min(first_detected),max(last_detected),by=1)), by = .(team_name, play_name)]
my_data_table[,.(PlayID, year = list(seq(min(first_detected),max(last_detected),by=1))), by = .(team_name, play_name)]

which both result in something that looks like

    by                                                      year                                    PlayID
1:   Baltimore Washington Dallas Play Action       2011, 2012, 2013, 2014, 2015, 2016 ...       41955-58 
...
In as.data.table.list(jval, .named = NULL) :
  Item 3 has 2 rows but longest item has 38530489; recycled with remainder.

I haven't found any clear answers on why this is happening. It seems like, when passing the "first detected' and "last detected", that it's interpreting it somehow as the entire range of the column's values, despite me passing the by = .(team_name,play_name), which always results in one distinct row, which I have verified. Going by the "by" grouping here should only have one value of first_detected and last_detected. I've done something similar before, but the difference was that I wasn't doing it with a "by = .(x,y,z,...)" grouping, and applied the operation on each row. Could anyone help me understand why I am unable to get the desired output with this data.table method?

Please provide a reproducible example of your data. Hint: use dput(head(my_data_table)) — s_baldur
– s_baldur, Commented Jan 5, 2022 at 16:15
@sindri_baldur Question has been edited, you should now be able to copy+paste into R — econometrica_33
– econometrica_33, Commented Jan 5, 2022 at 16:43

econometrica_33 · Accepted Answer · 2022-01-05 16:31:34Z

1

Despite struggling with this for hours, I managed to solve my own question only a short while later.

The code

my_data_table[,.(PlayID, year = first_detected:last_detected), by = .(team_name, play_name)]

Produces the desired result, creating, by group, a row that has each year inclusive, so long as first_detected and last_detected are integers.

answered Jan 5, 2022 at 16:31

econometrica_33

836 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Frank Over a year ago

To avoid having to retype all the non-grouping columns (only PlayID here), you could join, something like

mDT = my_data_table[,.(year = first_detected:last_detected), by = .(team_name, play_name)]; my_data_table[mDT, on=.(team_name, play_name)]

Collectives™ on Stack Overflow

Adding missing panel dates by group as rows using data.table

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related