3

Context

I want to change df(wide format) to df_expected (long format) but failed.

I know I need to use pivot_longer, but there are multiple columns I need to transform.

> df
# A tibble: 2 × 5
     id dis1_event dis1_event_time dis2_event di2_event_time
  <dbl>      <dbl> <chr>                <dbl> <chr>         
1     1          1 2022-01-01               0 2022-12-31    
2     2          0 2018-01-01               1 2018-03-01    
> df_expected
# A tibble: 4 × 4
     id disease event time      
  <dbl> <chr>   <dbl> <chr>     
1     1 dis1        1 2022-01-01
2     1 dis2        0 2022-12-31
3     2 dis1        0 2018-01-01
4     2 dis2        1 2018-03-01

Question

How can I use pivot_longer with multiple columns in R?

Reproducible code

library(tidyverse)
df = tribble(
  ~id, ~dis1_event, ~dis1_event_time, ~dis2_event, ~di2_event_time,
  1,  1, '2022-01-01', 0, '2022-12-31',
  2, 0, '2018-01-01', 1, '2018-03-01',
)


df_expected = 
  tribble(
    ~id, ~disease, ~event, ~time,
    1, 'dis1', 1, '2022-01-01',
    1, 'dis2', 0, '2022-12-31',
    2, 'dis1', 0, '2018-01-01',
    2, 'dis2', 1, '2018-03-01',
  )

# my failed solution 
df %>% 
  pivot_longer(cols = ends_with('event'), names_to = 'disease', values_to = 'event') %>%
  pivot_longer(cols = ends_with('time'), names_to = 'disease', values_to = 'time')
1
  • pivot_longer(df, -id, names_pattern = "(dis.)_(.+)", names_to=c( "disease", ".value")) Commented Nov 15, 2022 at 3:28

1 Answer 1

3

Here is one potential solution:

library(tidyverse)

df = tribble(
  ~id, ~dis1_event, ~dis1_event_time, ~dis2_event, ~dis2_event_time,
  1,  1, '2022-01-01', 0, '2022-12-31',
  2, 0, '2018-01-01', 1, '2018-03-01',
)

df_expected = 
  tribble(
    ~id, ~disease, ~event, ~time,
    1, 'dis1', 1, '2022-01-01',
    1, 'dis2', 0, '2022-12-31',
    2, 'dis1', 0, '2018-01-01',
    2, 'dis2', 1, '2018-03-01',
  )

df
#> # A tibble: 2 × 5
#>      id dis1_event dis1_event_time dis2_event dis2_event_time
#>   <dbl>      <dbl> <chr>                <dbl> <chr>          
#> 1     1          1 2022-01-01               0 2022-12-31     
#> 2     2          0 2018-01-01               1 2018-03-01

df %>%
  pivot_longer(-id,
               names_pattern = "([a-z]+\\d)_([_a-z]+)",
               names_to = c("disease", ".value"))
#> # A tibble: 4 × 4
#>      id disease event event_time
#>   <dbl> <chr>   <dbl> <chr>     
#> 1     1 dis1        1 2022-01-01
#> 2     1 dis2        0 2022-12-31
#> 3     2 dis1        0 2018-01-01
#> 4     2 dis2        1 2018-03-01

df_expected
#> # A tibble: 4 × 4
#>      id disease event time      
#>   <dbl> <chr>   <dbl> <chr>     
#> 1     1 dis1        1 2022-01-01
#> 2     1 dis2        0 2022-12-31
#> 3     2 dis1        0 2018-01-01
#> 4     2 dis2        1 2018-03-01

Created on 2022-11-15 by the reprex package (v2.0.1)

The "time" column is named "event_time", but you could fix that with ... %>% rename("time" = "event_time")

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.