How to assign sequential values to a variable in R while defining sequence by number of values contained in a different variable

Question

I have patient data where a patient was given the same assessment at different time points. I want to number those assessments sequentially by date.

Here's my input:

12 x 3 df with cols: pt_id, assess_date, assess_id

Here's my desired output:

12 x 5 df with cols: pt_id, assess_date, assess_id, num_assess, assess_num

Here's what I've tried:

data <- data %>% 
           group_by(pt_id) %>%
           mutate(num_assess <- n_distinct(assess_date))

data$assess_num <- NA

data <- data %>% 
           group_by(pt_id) %>% 
           for(i in 1:num_assess) {
              assess_num <- i
            }

I also tried using n_distinct to define the sequence without creating the assess_num variable, but that didn't work either

Here's the error that I get:

Error in for (. in i) 1:num_assess : 4 arguments passed to 'for' which requires 3

Thoughts? TIA!

Hey tws061105, thanks for posting what you have attempted. It is also a good habit to post a reproducible example. On that note, is assess_date a date or a string? If it is, you can extract the month with something like: as.numeric(format(x, "%m")) (assuming you want it to be numeric). — Andrew
– Andrew, Commented Mar 2, 2019 at 1:27
Hey Andrew - thanks for that suggestion! That definitely makes sense! I'll keep that in mind for future posts! — tws061105
– tws061105, Commented Mar 2, 2019 at 21:59

Andrew · Accepted Answer · 2019-03-02 02:41:06Z

1

Clever solution from @desc. If your date is formatted as a date, and you want it to be numeric the below script works. This uses the data.example from desc (thank you), but the date format is d/m/y which is why format in as.Date is "%d/%m/%Y".

> data.example = structure(list(pt_id = c(1234L, 1234L, 1234L, 1234L, 4567L, 4567L, 
+                                         4567L, 4567L, 8900L, 8900L, 8900L, 8900L), assess_date = structure(c(1L, 
+                                                                                                              2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/1/2019", 
+                                                                                                                                                                      "1/2/2019", "1/3/2019", "1/4/2019"), class = "factor"), assess_id = c(64L, 
+                                                                                                                                                                                                                                            64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L)), class = "data.frame", row.names = c(NA, 
+                                                                                                                                                                                                                                                                                                                                         -12L))
> 
> data.example$assess_date <- as.Date(data.example$assess_date, format = "%d/%m/%Y")
> data.example$assess_num <- as.numeric(format(data.example$assess_date, "%m"))
> data.example
   pt_id assess_date assess_id assess_num
1   1234  2019-01-01        64          1
2   1234  2019-02-01        64          2
3   1234  2019-03-01        64          3
4   1234  2019-04-01        64          4
5   4567  2019-01-01        64          1
6   4567  2019-02-01        64          2
7   4567  2019-03-01        64          3
8   4567  2019-04-01        64          4
9   8900  2019-01-01        64          1
10  8900  2019-02-01        64          2
11  8900  2019-03-01        64          3
12  8900  2019-04-01        64          4

answered Mar 2, 2019 at 2:41

Andrew

5,1282 gold badges13 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

tws061105 Over a year ago

Thanks @Andrew! This looks like it is reliant upon the assessments occurring in different months, which isn't always the case for me (but which I recognize is consistent with the example that I provided). Thanks for weighing in, though, and thanks for the critique re: posting reproducible examples

Andrew Over a year ago

Sure thing, and I am glad you found a solution that works for your data! Also, it is customary to hit the check-mark next to the answer which solves your problem if your issue is resolved (i.e., to accept desc's answer). Thanks for following up too!

desc · Accepted Answer · 2019-03-07 21:48:31Z

1

Here is a simplified version using your dates (as factors) to simply extract the level of each variable:

data.example = structure(list(pt_id = c(1234L, 1234L, 1234L, 1234L, 4567L, 4567L, 
                  4567L, 4567L, 8900L, 8900L, 8900L, 8900L), assess_date = structure(c(1L, 
                  2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/1/2019", 
                  "1/2/2019", "1/3/2019", "1/4/2019"), class = "factor"), assess_id = c(64L, 
                  64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L)), class = "data.frame", row.names = c(NA, 
                  -12L))

data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(assess_date))

If they aren't factors (yet), then:

data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(assess_date)))

The output looks like:

# A tibble: 12 x 4
# Groups:   pt_id [3]
   pt_id assess_date assess_id assess_num
   <int> <fct>           <int>      <int>
 1  1234 1/1/2019           64          1
 2  1234 1/2/2019           64          2
 3  1234 1/3/2019           64          3
 4  1234 1/4/2019           64          4
 5  4567 1/1/2019           64          1
 6  4567 1/2/2019           64          2
 7  4567 1/3/2019           64          3
 8  4567 1/4/2019           64          4
 9  8900 1/1/2019           64          1
10  8900 1/2/2019           64          2
11  8900 1/3/2019           64          3
12  8900 1/4/2019           64          4

EDIT: Here is a more explicit set of potential solutions depending on what the original access_date column class is:

library(tidyr)
library(dplyr)

# data.example as tibble:
data.example = structure(list(pt_id = c(1234L, 1234L, 1234L, 1234L, 4567L, 4567L, 
  4567L, 4567L, 8900L, 8900L, 8900L, 8900L), assess_date = structure(c(1L, 
  2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/1/2019", 
  "1/2/2019", "1/3/2019", "1/4/2019"), class = "factor"), assess_id = c(64L, 
  64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L)), row.names = c(NA, 
  -12L), class = c("tbl_df", "tbl", "data.frame"))

# if assess_date is the string class:
data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(assess_date)))

# if assess_date is the factor class:
data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(as.Date(assess_date,"%m/%d/%Y"))))

# if assess_date is the Date class:
data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(assess_date)))

edited Mar 7, 2019 at 21:48

answered Mar 2, 2019 at 1:42

desc

1,2101 gold badge13 silver badges27 bronze badges

4 Comments

desc Over a year ago

@tws061105, the L denotes that the value will be an integer. You can create the reproducible data by taking all or part of your example data and using dput (e.g. dput(mtcars)).

tws061105 Over a year ago

Actually, this doesn't quite work. It worked when I just ran the proposed solution, but when I apply it to my actual data, it doesn't quite work. I see that this proposed solution establishes the "level" of the assessment date as assess_date = structure(c(1L,2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), and then labels those with the dates for those assessments, but in my actual data, I can't easily convert the dates to an integer, and converting the whole column to a factor doesn't quite work either, because patients can be assessed on different days (there's no one set of assessment dates)

tws061105 Over a year ago

Thanks for the help, and with your patience with me posting a simplified version of my data that is less-than-ideal and not reproducible :/

desc Over a year ago

@tws061105, check out the edits to see if it solves your issue. You shouldn't need to be doing any mapping from Date to integer, if the access_date column class is a factor, or you convert it to one as described above, that will map the dates for you

tws061105 · Accepted Answer · 2019-05-15 19:13:55Z

0

Many thanks for the suggestions. Unfortunately, I couldn't get any of the suggested solutions to work, but I did find exactly what I needed in the getanID function from the splitstackshape package, according to the following code:

getanID(data, "pt_id") - worked like a charm!

answered May 15, 2019 at 19:13

tws061105

213 bronze badges

Collectives™ on Stack Overflow

How to assign sequential values to a variable in R while defining sequence by number of values contained in a different variable

3 Answers 3

2 Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related