1

I'm having difficulty using data.table operations to correctly manipulate my data. The goal is to, by group create a number of rows for the group based on the value of two date columns. I'm changing my data here in order to protect it, but it gets the idea across

head(my_data_table, 6)

     team_name         play_name       first_detected  last_detected PlayID 
1:   Baltimore         Play Action     2016            2017          41955-58
2:   Washington        Four Verticals  2018            2020          54525-52
3:   Dallas            O1 Trap         2019            2019          44795-17
4:   Dallas            Play Action     2020            2020          41955-58
5:   Dallas            Power Zone      2020            2020          54782-29
6:   Dallas            Bubble Screen   2018            2018          52923-70

The goal is to turn it into this

     team_name            play_name      year      PlayID
1:   Baltimore         Play Action       2016       41955-58 
2:   Baltimore         Play Action       2017       41955-58 
3:   Washington      Four Verticals      2018       54525-52
4:   Washington      Four Verticals      2019       54525-52
5:   Washington      Four Verticals      2020       54525-52 
6:   Dallas               O1 Trap        2019       44795-17 
...  
n:   Dallas           Bubble Screen      2018       52923-70   

My code I attempt to employ for this purpose is the following

my_data_table[,.(PlayID, year = seq(first_detected,last_detected,by=1)), by = .(team_name, play_name)]

When I run this code, I get:

Error in seq.default(first_detected_ever, last_detected_ever, by = 1) : 
  'from' must be of length 1

Two other attempts also failed

my_data_table[,.(PlayID, year = seq(min(first_detected),max(last_detected),by=1)), by = .(team_name, play_name)]
my_data_table[,.(PlayID, year = list(seq(min(first_detected),max(last_detected),by=1))), by = .(team_name, play_name)]

which both result in something that looks like

    by                                                      year                                    PlayID
1:   Baltimore Washington Dallas Play Action       2011, 2012, 2013, 2014, 2015, 2016 ...       41955-58 
...
In as.data.table.list(jval, .named = NULL) :
  Item 3 has 2 rows but longest item has 38530489; recycled with remainder.

I haven't found any clear answers on why this is happening. It seems like, when passing the "first detected' and "last detected", that it's interpreting it somehow as the entire range of the column's values, despite me passing the by = .(team_name,play_name), which always results in one distinct row, which I have verified. Going by the "by" grouping here should only have one value of first_detected and last_detected. I've done something similar before, but the difference was that I wasn't doing it with a "by = .(x,y,z,...)" grouping, and applied the operation on each row. Could anyone help me understand why I am unable to get the desired output with this data.table method?

2
  • 1
    Please provide a reproducible example of your data. Hint: use dput(head(my_data_table)) Commented Jan 5, 2022 at 16:15
  • @sindri_baldur Question has been edited, you should now be able to copy+paste into R Commented Jan 5, 2022 at 16:43

1 Answer 1

1

Despite struggling with this for hours, I managed to solve my own question only a short while later.

The code

my_data_table[,.(PlayID, year = first_detected:last_detected), by = .(team_name, play_name)]

Produces the desired result, creating, by group, a row that has each year inclusive, so long as first_detected and last_detected are integers.

Sign up to request clarification or add additional context in comments.

1 Comment

To avoid having to retype all the non-grouping columns (only PlayID here), you could join, something like mDT = my_data_table[,.(year = first_detected:last_detected), by = .(team_name, play_name)]; my_data_table[mDT, on=.(team_name, play_name)]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.