0

I have a data frame that looks like this:

  ID             Email  Name   Company TripIdentifier      Date1  Campsite1 NumberOfAnimals1      Date2  Campsite2 NumberOfAnimals2
1  1 [email protected] Alice Company A         Trip 1 2022-01-01 Campsite A                5 2022-01-02 Campsite C                5
2  2 [email protected]   Bob Company B         Trip 2 2022-01-02 Campsite B                5 2022-01-03 Campsite D                5

I am trying to create an output table that combines a set of columns that is duplicated many times in my dataset (Date1, Campsite1, NumberOfAnimals1). They are always in the same order. I would like my resulting table to look like this:

  ID               Email   Name    Company TripIdentifier       Date      Campsite NumberOfAnimals
1  1   [email protected]  Alice  Company A         Trip 1 2022-01-01    Campsite A               5
2  1.  [email protected]  Alice  Company A         Trip 1 2022-01-02    Campsite C               5
3  2.  [email protected]    Bob  Company B         Trip 2 2022-01-02    Campsite B               5
4  2.  [email protected]    Bob  Company B         Trip 2 2022-01-03    Campsite D               5

So far, I have been trying to use pivot_longer() with a names_pattern() argument:

# Define the test data frame
Test <- data.frame(
  ID = c(1, 2),
  Email = c("[email protected]", "[email protected]"),
  Name = c("Alice", "Bob"),
  Company = c("Company A", "Company B"),
  TripIdentifier = c("Trip 1", "Trip 2"),
  Date1 = as.Date(c("2022-01-01", "2022-01-02")),
  Campsite1 = c("A", "B"),
  NumberOfAnimals1 = c(5, 5),
  Date2 = as.Date(c("2022-01-02", "2022-01-03")),
  Campsite2 = c("C", "D"),
  NumberOfAnimals2 = c(5, 5),
  stringsAsFactors = FALSE
)

# Create the specification using pivot_longer
spec <- Test %>%
  pivot_longer(
    cols = starts_with("Date"),  
    names_to = c(".value", "trip"),  
    names_pattern = "(.*)(\\d+)$" 
  )

# Now use this specification
reshaped <- spec
  )

However, this puts out:

# A tibble: 4 × 11
     ID Email             Name  Company  TripIdentifier Campsite1 NumberOfAnimals1 Campsite2 NumberOfAnimals2 trip  Date      
  <dbl> <chr>             <chr> <chr>    <chr>          <chr>                <dbl> <chr>                <dbl> <chr> <date>    
1     1 [email protected] Alice Company… Trip 1         A                        5 C                        5 1     2022-01-01
2     1 [email protected] Alice Company… Trip 1         A                        5 C                        5 2     2022-01-02
3     2 [email protected] Bob   Company… Trip 2         B                        5 D                        5 1     2022-01-02
4     2 [email protected] Bob   Company… Trip 2         B                        5 D                        5 2     2022-01-03

The resulting table only combines the "Date" column, but not the others in the pattern. I am new to Tidyverse and am getting a bit confused about all the ways to use pivot_longer(). Any ideas on how to accomplish this would be helpful and thanks in advance!

1 Answer 1

2

To achieve your desired result you also have to include the NumberOfAnimals and Campsite columns when pivoting.

library(tidyr)

Test %>%
  pivot_longer(
    cols = c(
      starts_with("Date"),
      starts_with("NumberOfAnimals"),
      starts_with("Campsite")
    ),
    names_to = c(".value", "trip"),
    names_pattern = "(.*)(\\d+)$"
  )
#> # A tibble: 4 × 9
#>      ID Email      Name  Company TripIdentifier trip  Date       NumberOfAnimals
#>   <dbl> <chr>      <chr> <chr>   <chr>          <chr> <date>               <dbl>
#> 1     1 user1@exa… Alice Compan… Trip 1         1     2022-01-01               5
#> 2     1 user1@exa… Alice Compan… Trip 1         2     2022-01-02               5
#> 3     2 user2@exa… Bob   Compan… Trip 2         1     2022-01-02               5
#> 4     2 user2@exa… Bob   Compan… Trip 2         2     2022-01-03               5
#> # ℹ 1 more variable: Campsite <chr>

Or to simplify you could use matches to include columns ending on a digit (thanks to @Onyambu for the reminder):

Test %>%
  pivot_longer(
    cols = matches("\\d+$"),
    names_to = c(".value", "trip"),
    names_pattern = "(.*)(\\d+)$"
  )
#> # A tibble: 4 × 9
#>      ID Email             Name  Company TripIdentifier trip  Date       Campsite
#>   <dbl> <chr>             <chr> <chr>   <chr>          <chr> <date>     <chr>   
#> 1     1 [email protected] Alice Compan… Trip 1         1     2022-01-01 A       
#> 2     1 [email protected] Alice Compan… Trip 1         2     2022-01-02 C       
#> 3     2 [email protected] Bob   Compan… Trip 2         1     2022-01-02 B       
#> 4     2 [email protected] Bob   Compan… Trip 2         2     2022-01-03 D       
#> # ℹ 1 more variable: NumberOfAnimals <dbl>
Sign up to request clarification or add additional context in comments.

2 Comments

You could use cols = matches("\\d$")
Thx @Onyambu. Of course. Added your suggestion as an edit. Kept the code "simple" as the OP claimed being a beginner. But as the OP uses patterns anyway it should be fine using a regex pattern to match the columns. (:

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.