pivot_longer into multiple columns

Question

I am trying to use pivot_longer. However, I am not sure how to use names_sep or names_pattern to solve this.

dat <- tribble(
     ~group,  ~BP,  ~HS,  ~BB, ~lowerBP, ~upperBP, ~lowerHS, ~upperHS, ~lowerBB, ~upperBB,
        "1", 0.51, 0.15, 0.05,     0.16,     0.18,      0.5,     0.52,     0.14,     0.16,
      "2.1", 0.67, 0.09, 0.06,     0.09,     0.11,     0.66,     0.68,     0.08,      0.1,
      "2.2", 0.36, 0.13, 0.07,     0.12,     0.15,     0.34,     0.38,     0.12,     0.14,
      "2.3", 0.09, 0.17, 0.09,     0.13,     0.16,     0.08,     0.11,     0.15,     0.18,
      "2.4", 0.68, 0.12, 0.07,     0.12,     0.14,     0.66,     0.69,     0.11,     0.13,
        "3", 0.53, 0.15, 0.06,     0.14,     0.16,     0.52,     0.53,     0.15,     0.16)

Desired output (First row from wide data)

group names   values lower upper
   1    BP      0.51  0.16  0.18
   1    HS      0.15  0.5   0.52
   1    BB      0.05  0.14  0.16

Can you give an example of how the desired output looks like as well as a reproducible data example using dput? — Fnguyen
– Fnguyen, Commented Apr 22, 2020 at 14:14
Hi, thank you for the comment, Im not familiar whith dput. But tried to make the desired output more clear. — Droc
– Droc, Commented Apr 22, 2020 at 14:57
Nevermind dput, I hadn't seen tribble before but it works the same. — Fnguyen
– Fnguyen, Commented Apr 22, 2020 at 14:59

Dave2e · Accepted Answer · 2020-04-22 16:29:26Z

39

Here is solution following a similar method that @Fnguyen used but using the newer pivot_longer and pivot_wider construct:

library(dplyr)
library(tidyr)

longer<-pivot_longer(dat, cols=-1, names_pattern = "(.*)(..)$", names_to = c("limit", "name")) %>% 
     mutate(limit=ifelse(limit=="", "value", limit))

answer <-pivot_wider(longer, id_cols = c(group, name), names_from = limit, values_from = value, names_repair = "check_unique")

Most of the selecting, separating, mutating and renaming is taking place within the pivot function calls.

Update:
This regular expressions "(.*)(..)$" means:
( ) ( ) Look for two parts,
(.*) the first part should have zero or more characters
(..) the second part should have just 2 characters at the “$” end of the string

edited Apr 22, 2020 at 16:29

answered Apr 22, 2020 at 16:05

Dave2e

24.3k18 gold badges46 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Droc Over a year ago

Kool, more accustomed to this syntax. However Im not good with the "(.*)(..)$" signs. Do you know if these are explained somewhere

CubicInfinity Oct 7 at 21:03

The syntax is Regular Expressions. They work the same in R as in most other languages, but you need to escape backslashes or wrap with the format, r"(regular expression goes here)". In this example, the parentheses are used to denote capture groups. My favorite resource is regex101.com

Faustin Gashakamba Over a year ago

This works like magic. Your answer here saved me a whole afternoon of head scratching and hair pulling

Ewen Gallic · Accepted Answer · 2023-06-08 16:28:08Z

10

I'd like to add an alternative tidyverse solution drawing from the answer provided by @Dave2e.

Like Dave2e's solution it's a two-step procedure (first rename, then reshape). Instead of reshaping the data twice, I add the prefix "values" to the columns named "BP", "HS", and "BB" using rename_with. This was necessary for getting the column names right when using the .value sentinel in the names_to argument of pivot_longer.

library(dplyr)
library(tidyr)

dat %>% 
  rename_with(~sub("^(BP|HS|BB)$", "values\\1", .)) %>%     # add prefix values
  pivot_longer(cols= -1,
               names_pattern = "(.*)(BP|HS|BB)$",
               names_to = c(".value", "names"))

edited Jun 8, 2023 at 16:28

Ewen Gallic

33 bronze badges

answered Mar 3, 2022 at 15:44

maraab

5153 silver badges12 bronze badges

Comments

desval · Accepted Answer · 2020-04-22 16:21:22Z

8

A data.table version (not sure yet how to retain the original names so that you dont need to post substitute them https://github.com/Rdatatable/data.table/issues/2551):

library(data.table)
df <- data.table(dat)
v <- c("BP","HS","BB")
setnames(df, v, paste0("x",v) )

g <- melt(df, id.vars = "group",
     measure.vars = patterns(values = "x" ,
                             lower = "lower",
                             upper = "upper"),
     variable.name = "names")

g[names==1, names := "BP" ]
g[names==2, names := "HS" ]
g[names==3, names := "BB" ]

    group names values lower upper
 1:     1    BP   0.51  0.16  0.18
 2:   2.1    BP   0.67  0.09  0.11
 3:   2.2    BP   0.36  0.12  0.15
 4:   2.3    BP   0.09  0.13  0.16
 5:   2.4    BP   0.68  0.12  0.14
 6:     3    BP   0.53  0.14  0.16
 7:     1    HS   0.15  0.50  0.52
 8:   2.1    HS   0.09  0.66  0.68
 9:   2.2    HS   0.13  0.34  0.38
10:   2.3    HS   0.17  0.08  0.11
11:   2.4    HS   0.12  0.66  0.69
12:     3    HS   0.15  0.52  0.53
13:     1    BB   0.05  0.14  0.16
14:   2.1    BB   0.06  0.08  0.10
15:   2.2    BB   0.07  0.12  0.14
16:   2.3    BB   0.09  0.15  0.18
17:   2.4    BB   0.07  0.11  0.13
18:     3    BB   0.06  0.15  0.16

edited Apr 22, 2020 at 16:21

answered Apr 22, 2020 at 16:03

desval

2,4352 gold badges18 silver badges24 bronze badges

1 Comment

jmutua Over a year ago

What about having the names in multiple columns and values, lower, and upper in a single column?

Fnguyen · Accepted Answer · 2020-04-22 15:24:45Z

5

Based on your example data this solution using dplyr works for me:

library(dplyr)

dat %>%
  gather(key, values,-group) %>%
  mutate(names = gsub("lower","",gsub("upper","",key))) %>%
  separate(key, into = c("key1","key2") ,"[[:upper:]]", perl=T) %>%
  mutate(key1 = case_when(key1 == "" ~ "values", TRUE ~ key1)) %>%
  select(group,names,key1,values) %>%
  rowid_to_column() %>%
  spread(key1,values) %>%
  select(-rowid) %>%
  group_by(group,names) %>%
  summarise_all(mean,na.rm = TRUE)

answered Apr 22, 2020 at 15:24

Fnguyen

1,17711 silver badges24 bronze badges

4 Comments

Droc Over a year ago

That is some serious code. Somehow this does not work for me: "Error: 1 components of ... were not used. We detected these problematic arguments: * perl"

Fnguyen Over a year ago

@Droc have you tried removing the , perl = T argument in the separate statement?

Fnguyen Over a year ago

@Droc also as a fun way to better understand what I did and how to improve/repeat go through each line by adding a head() to see what each operation does.

moodymudskipper Over a year ago

separate doesn't have a perl argument though, and never did, it might have been silently ignored in the past, see : github.com/tidyverse/tidyr/issues/789

Collectives™ on Stack Overflow

pivot_longer into multiple columns

4 Answers 4

3 Comments

Comments

1 Comment

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related