2

new here and fairly fresh to R.

I have data set that needs cleaning.  ID column is identifies unique subjects. visit_index is a visit number (total of 3 visits per subject).

What I need is:

  1. to get measurement and adherence duplicated across their corresponding subjects, by ID
  2. to only keep "exercise" intervention row and another row that has visit_index  = minus 1 of the exercise visit_index.

So we end up with two rows per subject, with measurements duplicated. Actual dataset is larger - over 100 variables. So to duplicate values I'd like to pass in a list of variables or a range(s) of columns.

I got this step but could not progress further.

# This grabs the value of visit_index for exercise, adds into new column
df2 <- df2 %>% 
  group_by(ID) %>%
  mutate(
    visit_exercise =
      ifelse(intervention == "exercise", visit_index, NA)
  )

Input data and desired output:

# Example data:
df2 <- read.table(text=
"visit_index    ID  intervention    adherence   measurement
0   01JV    baseline    66.1    24.5
1   01JV    exercise    NA  NA
2   01JV    detrain NA  NA
0   02AM    baseline    52.0    21.3
1   02AM    detrain NA  NA
2   02AM    exercise    NA  NA
0   03JW    baseline    83.7    23.6
1   03JW    detrain NA  NA
2   03JW    exercise    NA  NA
", header=TRUE) 


# desired output:
df3 <- read.table(text=
                    "visit_index    ID  intervention    adherence   measurement
0   01JV    baseline    66.1    24.5
1   01JV    exercise    66.1    24.5
1   02AM    detrain 52.0    21.3
2   02AM    exercise    52.0    21.3
1   03JW    detrain 83.7    23.6
2   03JW    exercise    83.7    23.6
", header=TRUE) 

2 Answers 2

1

A try with dplyr & tidyr

library(dplyr)
library(tidyr)
df2 %>%
  # with arrange function NA will always be at bottom of the data
  arrange(adherence, measurement) %>%
  group_by(ID) %>%
  fill(adherence, measurement, .direction = "down") %>%
  filter(visit_index == visit_index[intervention == "exercise"] |
      visit_index == visit_index[intervention == "exercise"] - 1) %>%
  ungroup()


#> # A tibble: 6 x 5
#> # Groups:   ID [3]
#>   visit_index ID    intervention adherence measurement
#>         <int> <chr> <chr>            <dbl>       <dbl>
#> 1           0 01JV  baseline          66.1        24.5
#> 2           1 01JV  exercise          66.1        24.5
#> 3           1 02AM  detrain           52          21.3
#> 4           2 02AM  exercise          52          21.3
#> 5           1 03JW  detrain           83.7        23.6
#> 6           2 03JW  exercise          83.7        23.6
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Sinh, very cool solution. In fact, the data to be duplicated will not be always in the first row of the group. If you add in a sort / arrange step prior this should take care of it.
Updated with arrange function ;)
Thanks Sinh. It works on my RepEx, however I could not work it out on the full dataset. I did find another solution though.
1

Just sharing what worked on full dataset as well as on repex above, in case someone has a similar wrangling problem. (kudos to mhakanda)

    df2 %>%
      group_by(ID) %>%
      mutate(adherence = sum(adherence,na.rm = TRUE), measurement = sum(measurement,na.rm = TRUE)) %>%
      mutate(dd = as.numeric(intervention=="exercise")*visit_index-1) %>%
      filter(intervention == "exercise" | (visit_index==max(dd))) %>%
      mutate(dd=NULL) %>%
      ungroup()

    #> # A tibble: 6 x 5
    #>   visit_index ID    intervention adherence measurement
    #>         <int> <chr> <chr>            <dbl>       <dbl>
    #> 1           0 01JV  baseline          66.1        24.5
    #> 2           1 01JV  exercise          66.1        24.5
    #> 3           1 02AM  detrain           52          21.3
    #> 4           2 02AM  exercise          52          21.3
    #> 5           1 03JW  detrain           83.7        23.6
    #> 6           2 03JW  exercise          83.7        23.6

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.