0

I've read through a number of similar posts and tutorials but am really struggling to understand the solution to my issue. I have a dataset that is wide, and when I make it longer - I want to collapse two sets of data (both duration and results).

For each participant (id), there is a category, and then a series of blood test results. Each test has both duration (in days) and a result (numeric value).

Here's how it looks now:

id category duration_1 results_1 duration_2 results_2 duration_3 results_3
01 diabetic 58 32 65 56 76 87
02 prediabetic 54 32 65 25 76 35
03 unknown 46 65 65 56 21 67

How I'd like it to be is:

id category duration results
01 diabetic 58 32
01 diabetic 65 56
01 diabetic 76 87
02 prediabetic 54 32
02 prediabetic 65 25
02 prediabetic 76 35
03 unknown 46 65
03 unknown 65 25
03 unknown 21 67

I can get pivot longer to work for "results" - but I can't get it to pivot on both "results" and "duration".

Any assistance would be greatly appreciated. I'm still fairly new to R. Thanks!

3

1 Answer 1

3

One way is to separate the column names into two columns while you pivot (hence the names_sep below). Then, you can just drop the column number.

library(tidyverse)

df %>%
  tidyr::pivot_longer(!c(id, category),
                      names_to = c(".value", "num"),
                      names_sep = "_") %>%
  dplyr::select(-num)

Output

# A tibble: 9 × 4
  id    category    duration results
  <chr> <chr>          <dbl>   <dbl>
1 01    diabetic          32      23
2 01    diabetic          87      67
3 01    diabetic          98      78
4 02    prediabetic       43      45
5 02    prediabetic       34      65
6 02    prediabetic       12      12
7 03    unknown           32      54
8 03    unknown           75      45
9 03    unknown           43      34

Data

df <-
  structure(
    list(
      id = c("01", "02", "03"),
      category = c("diabetic", "prediabetic", "unknown"),
      duration_1 = c(32, 43, 32),
      results_1 = c(23, 45, 54),
      duration_2 = c(87, 34, 75),
      results_2 = c(67, 65, 45),
      duration_3 = c(98, 12, 43),
      results_3 = c(78, 12, 34)
    ),
    class = "data.frame",
    row.names = c(NA,-3L)
  )
Sign up to request clarification or add additional context in comments.

5 Comments

why use dplyr:: etc? Youve already loaded tidyverse
also note pivot_longer(df, -c(id, category), names_to = ".value",names_pattern = '([^_]+)')
@Onyambu As far as dplyr::, I just prefer (in general) to always be explicit. Great, thanks for that suggestion on the names pattern. That's a really clever way to handle it!
Its okay, though you do not need it once you have loaded the libraries. Thats the whole purpose of loading the library. eg you are not using magritrr::`%>%`
Thanks that worked well. Much appreciated!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.