Creating a new variable from values in existing rows and columns

Question

I would like to create a new variable that is labelled change_index. This variable is outcome1 at time 3 - outcome 1 at time 1 / outcome1 at time 1.

How do I go about doing this? I tried doing the following

outcome1t0 <- data %>%
filter(time == "1") %>%
select(outcome1)

outcome1t12 <- data %>%
filter(time == "3") %>%
select(outcome1)

data$newvariable <- (outcome1t0 - outcome1t12) / outcome1t0

but I get the following error

Error in `$<-.data.frame`(`*tmp*`, bicind, value = list(bicep = c(13.3591525423729,  : 
replacement has 20 rows, data has 60

I realize this happens because the new data frame is smaller since it contains less rows. Should I just create a new data frame with change index? How do I go about doing this?

I have to calculate this change index for many variables in columns (many outcomes). Is there a way to automate this process?

Thanks for reading.

   subject treatment time outcome1 outcome2
1       1         a    1       80       15
2       1         a    2       75       14
3       1         a    3       74       12
4       2         b    1       90       16
5       2         b    2       81       15
6       2         b    3       76       15

EDIT 1

Tried doing the following as suggested below, I changed the names according to my data

ancestral1 %>%
group_by(subject) %>% 
mutate(bicep0 = bicep[time == 0],
     bicep12 = bicep[time == 12], 
     bicepind = (bicep12 - bicep0) / bicep12)

I get the following error

Error in mutate_impl(.data, dots) : 
Column `bicep0` must be length 1 (the group size), not 0

EDIT 2

Tried the new suggestion, still the same error

ancestral1 %>% 
group_by(subject) %>% 
mutate(bicep0 = if(any(time == 5)) bicep[time == 5] else NA, 
     bicep12 = bicep[time == 3], 
     bicepind = (bicep0 - bicep12) / bicep0)

Error in mutate_impl(.data, dots) : 
Column `bicep12` must be length 1 (the group size), not 0

The reason for the error is while you filter, the number of rows differ for both objects — akrun
– akrun, Commented Sep 21, 2018 at 15:30
In the example you showed, both 'subject' have the 1 and 3. If it is not the case, it will result in error. You may have to change the example and also show the expected output in that case — akrun
– akrun, Commented Sep 21, 2018 at 15:55
Thanks, in my data set all subjects have outcomes at only times 0,6,12 weeks. There are about 40 subjects. I am not sure what is going wrong. — DiscoR
– DiscoR, Commented Sep 21, 2018 at 16:03
Please check the code data %>% group_by(subject) %>% mutate(outcome1t0 = if(any(time == 5)) outcome1[time == 5] else NA, outcome1t2 = outcome1[time == 3], newvariable = (outcome1t0 - outcome1t2) / outcome1t0) — akrun
– akrun, Commented Sep 21, 2018 at 16:04
Thanks, I tried, that the same error. I have updated the main post. — DiscoR
– DiscoR, Commented Sep 21, 2018 at 16:12

akrun · Accepted Answer · 2018-09-21 15:52:51Z

1

Instead of doing the filter, we create new variables

data %>%
  group_by(subject) %>% 
  mutate(outcome1t0 = outcome1[time == 1],
       outcome1t2 = outcome1[time == 3], 
       newvariable = (outcome1t0 - outcome1t2) / outcome1t0) %>%
  select(-outcome1t0, -outcome1t2)
# A tibble: 6 x 6
# Groups:   subject [2]
#  subject treatment  time outcome1 outcome2 newvariable
#    <int> <chr>     <int>    <int>    <int>       <dbl>
#1       1 a             1       80       15       0.075
#2       1 a             2       75       14       0.075
#3       1 a             3       74       12       0.075
#4       2 b             1       90       16       0.156
#5       2 b             2       81       15       0.156
#6       2 b             3       76       15       0.156

edited Sep 21, 2018 at 15:52

answered Sep 21, 2018 at 15:34

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

DiscoR Over a year ago

Thanks @akrun I tried doing this, but get an error. I have update my main post to show the error.

akrun Over a year ago

@DiscoR If you have only unique value of time for each 'subject' it should work

akrun Over a year ago

@DiscoR Create an if/else condition for e.g. there is not time 5. If I use

data  %>% group_by(subject) %>% mutate(outcome1t0 = if(any(time == 5)) outcome1[time == 5] else NA, outcome1t2 = outcome1[time == 3],          newvariable = (outcome1t0 - outcome1t2) / outcome1t0)

DiscoR Over a year ago

Okay, Thanks a lot. I think it worked. I am not seeing the new variable in the data frame that I pull from the environment though. How do I create this new column and save it to data frame in the environment?

Frank Over a year ago

@DiscoR You should probably use first/last instead of hard-coding the times, eg

DT %>% group_by(subject) %>% summarise(change = if (n() == 1) NA_real_ else (last(outcome1) - first(outcome1))/first(outcome1))

I guess there's some summarise_at or mutate_at if you try to tackle multiple outcomes this way

|

Collectives™ on Stack Overflow

Creating a new variable from values in existing rows and columns

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related