R: duplicate values row-wise based on group then select rows that match a criteria - unusual wrangling problem

Question

new here and fairly fresh to R.

I have data set that needs cleaning. ID column is identifies unique subjects. visit_index is a visit number (total of 3 visits per subject).

What I need is:

to get measurement and adherence duplicated across their corresponding subjects, by ID
to only keep "exercise" intervention row and another row that has visit_index = minus 1 of the exercise visit_index.

So we end up with two rows per subject, with measurements duplicated. Actual dataset is larger - over 100 variables. So to duplicate values I'd like to pass in a list of variables or a range(s) of columns.

I got this step but could not progress further.

# This grabs the value of visit_index for exercise, adds into new column
df2 <- df2 %>% 
  group_by(ID) %>%
  mutate(
    visit_exercise =
      ifelse(intervention == "exercise", visit_index, NA)
  )

Input data and desired output:

# Example data:
df2 <- read.table(text=
"visit_index    ID  intervention    adherence   measurement
0   01JV    baseline    66.1    24.5
1   01JV    exercise    NA  NA
2   01JV    detrain NA  NA
0   02AM    baseline    52.0    21.3
1   02AM    detrain NA  NA
2   02AM    exercise    NA  NA
0   03JW    baseline    83.7    23.6
1   03JW    detrain NA  NA
2   03JW    exercise    NA  NA
", header=TRUE) 


# desired output:
df3 <- read.table(text=
                    "visit_index    ID  intervention    adherence   measurement
0   01JV    baseline    66.1    24.5
1   01JV    exercise    66.1    24.5
1   02AM    detrain 52.0    21.3
2   02AM    exercise    52.0    21.3
1   03JW    detrain 83.7    23.6
2   03JW    exercise    83.7    23.6
", header=TRUE)

Sinh Nguyen · Accepted Answer · 2021-03-26 01:23:25Z

1

A try with dplyr & tidyr

library(dplyr)
library(tidyr)
df2 %>%
  # with arrange function NA will always be at bottom of the data
  arrange(adherence, measurement) %>%
  group_by(ID) %>%
  fill(adherence, measurement, .direction = "down") %>%
  filter(visit_index == visit_index[intervention == "exercise"] |
      visit_index == visit_index[intervention == "exercise"] - 1) %>%
  ungroup()


#> # A tibble: 6 x 5
#> # Groups:   ID [3]
#>   visit_index ID    intervention adherence measurement
#>         <int> <chr> <chr>            <dbl>       <dbl>
#> 1           0 01JV  baseline          66.1        24.5
#> 2           1 01JV  exercise          66.1        24.5
#> 3           1 02AM  detrain           52          21.3
#> 4           2 02AM  exercise          52          21.3
#> 5           1 03JW  detrain           83.7        23.6
#> 6           2 03JW  exercise          83.7        23.6

edited Mar 26, 2021 at 1:23

answered Mar 25, 2021 at 13:07

Sinh Nguyen

4,5423 gold badges22 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Andrius Over a year ago

Thanks Sinh, very cool solution. In fact, the data to be duplicated will not be always in the first row of the group. If you add in a sort / arrange step prior this should take care of it.

Sinh Nguyen Over a year ago

Updated with arrange function ;)

Andrius Over a year ago

Thanks Sinh. It works on my RepEx, however I could not work it out on the full dataset. I did find another solution though.

Andrius · Accepted Answer · 2021-04-07 23:00:30Z

Just sharing what worked on full dataset as well as on repex above, in case someone has a similar wrangling problem. (kudos to mhakanda)

    df2 %>%
      group_by(ID) %>%
      mutate(adherence = sum(adherence,na.rm = TRUE), measurement = sum(measurement,na.rm = TRUE)) %>%
      mutate(dd = as.numeric(intervention=="exercise")*visit_index-1) %>%
      filter(intervention == "exercise" | (visit_index==max(dd))) %>%
      mutate(dd=NULL) %>%
      ungroup()

    #> # A tibble: 6 x 5
    #>   visit_index ID    intervention adherence measurement
    #>         <int> <chr> <chr>            <dbl>       <dbl>
    #> 1           0 01JV  baseline          66.1        24.5
    #> 2           1 01JV  exercise          66.1        24.5
    #> 3           1 02AM  detrain           52          21.3
    #> 4           2 02AM  exercise          52          21.3
    #> 5           1 03JW  detrain           83.7        23.6
    #> 6           2 03JW  exercise          83.7        23.6

Collectives™ on Stack Overflow

R: duplicate values row-wise based on group then select rows that match a criteria - unusual wrangling problem

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related