1

I have a dataframe that looks like this:

df <- data.frame(ID = c(1,2,3,4,5,6), Type = c("A","A","B","B","C","C"), `2019` = c(1,2,3,4,5,6),`2020` = c(2,3,4,5,6,7), `2021` = c(3,4,5,6,7,8))

  ID Type X2019 X2020 X2021
1  1    A     1     2     3
2  2    A     2     3     4
3  3    B     3     4     5
4  4    B     4     5     6
5  5    C     5     6     7
6  6    C     6     7     8

Now, I'm looking for some code that does the following: 1. Create a new data.frame for every row in df 2. Names the new dataframe with a combination of "ID" and "Type" (A_1, A_2, ... , C_6)

The resulting new dataframes should look like this (example for A_1, A_2 and C_6):

  Year Values
1 2019      1
2 2020      2
3 2021      3

  Year Values
1 2019      2
2 2020      3
3 2021      4

  Year Values
1 2019      6
2 2020      7
3 2021      8

I have some things that somehow complicate the code: 1. The code should work in the next few years without any changes, meaning next year the data.frame df will no longer contain the years 2019-2021, but rather 2020-2022. 2. As the data.frame df is only a minimal reproducible example, I need some kind of loop. In the "real" data, I have a lot more rows and therefore a lot more dataframes to be created.

Unfortunately, I can't give you any code, as I have absolutely no idea how I could manage that. While researching, I found the following code that may help adress the first problem with the changing years:

year <- as.numeric(format(Sys.Date(), "%Y"))

Further, I read about list, and that it may help to work with a list in a for loop and then transform the list back into a dataframe. Sorry for my limited approach, I hope anyone can give me a hint or even the solution to my problem. If you need any further information, please let me know. Thanks in advance!

A kind of similar question to mine: Populating a data frame in R in a loop

2 Answers 2

1

Try this:

library(stringr)
library(dplyr)
library(tidyr)
library(magrittr)

df %>%
  gather(Year, Values, 3:5) %>%
  mutate(Year = str_sub(Year, 2)) %>%
  select(ID, Year, Values) %>%
  group_split(ID) # split(.$ID) 

# [[1]]
# # A tibble: 3 x 3
#     ID Year  Values
#   <dbl> <chr>  <dbl>
# 1     1 2019       1
# 2     1 2020       2
# 3     1 2021       3
# 
# [[2]]
# # A tibble: 3 x 3
#     ID Year  Values
#   <dbl> <chr>  <dbl>
# 1     2 2019       2
# 2     2 2020       3
# 3     2 2021       4
# 
# [[3]]
# # A tibble: 3 x 3
#     ID Year  Values
#   <dbl> <chr>  <dbl>
# 1     3 2019       3
# 2     3 2020       4
# 3     3 2021       5
# 
# [[4]]
# # A tibble: 3 x 3
#     ID Year  Values
#   <dbl> <chr>  <dbl>
# 1     4 2019       4
# 2     4 2020       5
# 3     4 2021       6
# 
# [[5]]
# # A tibble: 3 x 3
#     ID Year  Values
#   <dbl> <chr>  <dbl>
# 1     5 2019       5
# 2     5 2020       6
# 3     5 2021       7
# 
# [[6]]
# # A tibble: 3 x 3
#     ID Year  Values
# <dbl> <chr>  <dbl>
# 1     6 2019       6
# 2     6 2020       7
# 3     6 2021       8


Data

df <- data.frame(ID = c(1,2,3,4,5,6), Type = c("A","A","B","B","C","C"), `2019` = c(1,2,3,4,5,6),`2020` = c(2,3,4,5,6,7), `2021` = c(3,4,5,6,7,8))
Sign up to request clarification or add additional context in comments.

4 Comments

select is from dplyr. If you add library(dplyr), this works.
select is from dplyr, but dplyr is from tidyverse as well... therefore the suggestion from @deepseefan should work just fine. Unfortunately, I receive the following error: Error in group_split(., ID) : could not find function "group_split" I double checked the 2 packages and if they are activated.
@TheEconomist, if group_split is troubling you, replace it with split(.$ID) and it should work.
After updating my R aswell as the dplyr package I can confirm, both of your 2 variants work perfectly fine. Thank you for your time.
1
library(magrittr)
library(tidyr)
library(dplyr)
library(stringr)

names(df) <- str_replace_all(names(df), "X", "") #remove X's from year names

df %>%
  gather(Year, Values, 3:5) %>%
  select(ID, Year, Values) %>%
  group_split(ID)

2 Comments

Thank you very much for your answer. You were relly observant about the X in the year names. Unfortunately, i receive the following error: Error in group_split(., ID) : could not find function "group_split". I already double checked the 2 packages and if they are really activated..
If it's not in your version of dplyr, you might not have the latest. You might need to update your packages and even R, if you don't have 3.6.1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.