0

I have looked at multiple posts but have not seen this exact situation addressed. The linked examples have more consistent variable names (not single/multiple underscores) and do not add the count column.

Dataset currently looks like this

idno sex age_1 date_visit_1 height_in_cm_1 age_2 date_visit_2 height_in_cm_2 age_3 date_visit_3 height_in_cm_3

20   M   10    10/1/2010    100            11    10/1/2011    110             12   10/2/2012    115
21   F   11    10/2/2010    90             12    11/3/2011    100             13   12/5/2012    105
22   M   12    11/3/2010    100            13    12/4/2011    105             14   12/5/2012    110

I want

idno sex age date_visit height_in_cm visit_no
20   M    10 10/1/2010 100           1
20   M    11 10/1/2011 110           2
20   M    12 10/2/2012 115           3
21   F    11 10/2/2010  90           1
21   F    12 11/3/2011 100           2
21   F    13 12/5/2012 105           3  
22   M    12 11/3/2010 100           1
22   M    13 12/4/2011 105           2
22   M    14 12/5/2012 110           3

I have not been able to make it work. I either get two datasets stacked on each other, or the column names are wrong. The names_pattern and names_sep have not helped me since the format of the variable names is similar but not identical.

0

2 Answers 2

1

One option is to temporarily change your numeric variables to characters, so you can pivot them with the visit dates.

library(tidyr)
library(dplyr)

df |>
  mutate(across(c(starts_with('age'), starts_with('height')), as.character)) |>
  pivot_longer(c(-idno, -sex),
               names_to = c('stat', 'visit_no'),
               names_pattern = '^([a-zA-Z0-9]+_?[a-zA-Z0-9]+_?[a-zA-Z0-9]+)_([0-9]+)$',
               values_to = 'value') |>
  pivot_wider(names_from = stat, values_from = value) |>
  mutate(across(c(visit_no, age, height_in_cm), as.numeric),
         date_visit = as.Date(date_visit, format = '%m/%d/%y'))
#> # A tibble: 9 × 6
#>    idno sex   visit_no   age date_visit height_in_cm
#>   <int> <chr>    <dbl> <dbl> <date>            <dbl>
#> 1    20 M            1    10 2020-10-01          100
#> 2    20 M            2    11 2020-10-01          110
#> 3    20 M            3    12 2020-10-02          115
#> 4    21 F            1    11 2020-10-02           90
#> 5    21 F            2    12 2020-11-03          100
#> 6    21 F            3    13 2020-12-05          105
#> 7    22 M            1    12 2020-11-03          100
#> 8    22 M            2    13 2020-12-04          105
#> 9    22 M            3    14 2020-12-05          110

Created on 2024-04-18 with reprex v2.1.0

Sign up to request clarification or add additional context in comments.

Comments

1
library(dplyr)
library(tidyr)

data <- read.csv(text = "
idno,sex,age_1,date_visit_1,height_in_cm_1,age_2,date_visit_2,height_in_cm_2,age_3,date_visit_3,height_in_cm_3
20,M,10,10/1/2010,100,11,10/1/2011,110,12,10/2/2012,115
21,F,11,10/2/2010,90,12,11/3/2011,100,13,12/5/2012,105
22,M,12,11/3/2010,100,13,12/4/2011,105,14,12/5/2012,110
")

data %>%
  mutate_all(as.character) %>%
  pivot_longer(cols = c(-idno, -sex)) %>%
  mutate(
    visit_no = sub(".*_", "", name),
    name = sub("_[0-9]$", "", name)
  ) %>%
  pivot_wider(
    names_from = name,
    values_from = value
  )

Output:

  idno  sex   visit_no age   date_visit height_in_cm
  <chr> <chr> <chr>    <chr> <chr>      <chr>       
1 20    M     1        10    10/1/2010  100         
2 20    M     2        11    10/1/2011  110         
3 20    M     3        12    10/2/2012  115         
4 21    F     1        11    10/2/2010  90          
5 21    F     2        12    11/3/2011  100         
6 21    F     3        13    12/5/2012  105         
7 22    M     1        12    11/3/2010  100         
8 22    M     2        13    12/4/2011  105         
9 22    M     3        14    12/5/2012  110         

You can use select() to put the visit_no column at the end if you want.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.