-1

I have a data frame of county executives and the year they were inaugurated.

I am runnig a panel study with county-year as the unit of analyis. The date range is 2000 to 2004.

I will like to expand the df such that it lists who was the county executive during each year between the years 2000 and 2004.

For instance, I would like this df

df <- data.frame(year= c(2000, 2001, 2003, 2000, 2002, 2004),
                  executive.name= c("Johnson", "Smith", "Alleghany", "Roberts", "Clarke", "Tollson"),
                 party= c("PartyRed", "PartyYellow", "PartyGreen", "PartyYellow", "PartyOrange", "PartyRed"),
                  district= rep(c(1001, 1002), each=3))

to look like this

df.neat <- data.frame(year= c(2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004),
                  executive.name= c("Johnson", "Smith", "Smith", "Alleghany", "Alleghany", "Roberts", "Roberts", "Clarke", "Clarke", "Tollson"),
                  party= c("PartyRed", "PartyYellow", "PartyYellow", "PartyGreen", "PartyGreen", "PartyYellow", "PartyYellow", "PartyOrange", "PartyOrange", "PartyRed"),
                  district= rep(c(1001, 1002), each=5))

1 Answer 1

3
df |>
  tidyr::complete(district, year) |>
  dplyr::group_by(district) |>
  tidyr::fill(executive.name, party) |>
  dplyr::ungroup()

Result

# A tibble: 10 × 4
   district  year executive.name party      
      <dbl> <dbl> <chr>          <chr>      
 1     1001  2000 Johnson        PartyRed   
 2     1001  2001 Smith          PartyYellow
 3     1001  2002 Smith          PartyYellow
 4     1001  2003 Alleghany      PartyGreen 
 5     1001  2004 Alleghany      PartyGreen 
 6     1002  2000 Roberts        PartyYellow
 7     1002  2001 Roberts        PartyYellow
 8     1002  2002 Clarke         PartyOrange
 9     1002  2003 Clarke         PartyOrange
10     1002  2004 Tollson        PartyRed
Sign up to request clarification or add additional context in comments.

2 Comments

This is very helpful and worked for the most part. The problem is that some of the counties in my full df were created in the course of the time period. COMPLETE assumes that those years are implicit NAs. FILL then drags down the last row of the previous district into the new district. Is there anyway of running this code individually for each group? I re-asked the question with the new parameters here Thank you! stackoverflow.com/questions/78756985/…
Aha, I should have anticipated that. I would add group_by(district) befor the fill and ungroup() after.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.