2

I have patient data where a patient was given the same assessment at different time points. I want to number those assessments sequentially by date.

Here's my input:

12 x 3 df with cols: pt_id, assess_date, assess_id

Here's my desired output:

12 x 5 df with cols: pt_id, assess_date, assess_id, num_assess, assess_num

Here's what I've tried:

data <- data %>% 
           group_by(pt_id) %>%
           mutate(num_assess <- n_distinct(assess_date))

data$assess_num <- NA

data <- data %>% 
           group_by(pt_id) %>% 
           for(i in 1:num_assess) {
              assess_num <- i
            }

I also tried using n_distinct to define the sequence without creating the assess_num variable, but that didn't work either

Here's the error that I get:

Error in for (. in i) 1:num_assess : 4 arguments passed to 'for' which requires 3

Thoughts? TIA!

2
  • 2
    Hey tws061105, thanks for posting what you have attempted. It is also a good habit to post a reproducible example. On that note, is assess_date a date or a string? If it is, you can extract the month with something like: as.numeric(format(x, "%m")) (assuming you want it to be numeric). Commented Mar 2, 2019 at 1:27
  • 1
    Hey Andrew - thanks for that suggestion! That definitely makes sense! I'll keep that in mind for future posts! Commented Mar 2, 2019 at 21:59

3 Answers 3

1

Clever solution from @desc. If your date is formatted as a date, and you want it to be numeric the below script works. This uses the data.example from desc (thank you), but the date format is d/m/y which is why format in as.Date is "%d/%m/%Y".

> data.example = structure(list(pt_id = c(1234L, 1234L, 1234L, 1234L, 4567L, 4567L, 
+                                         4567L, 4567L, 8900L, 8900L, 8900L, 8900L), assess_date = structure(c(1L, 
+                                                                                                              2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/1/2019", 
+                                                                                                                                                                      "1/2/2019", "1/3/2019", "1/4/2019"), class = "factor"), assess_id = c(64L, 
+                                                                                                                                                                                                                                            64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L)), class = "data.frame", row.names = c(NA, 
+                                                                                                                                                                                                                                                                                                                                         -12L))
> 
> data.example$assess_date <- as.Date(data.example$assess_date, format = "%d/%m/%Y")
> data.example$assess_num <- as.numeric(format(data.example$assess_date, "%m"))
> data.example
   pt_id assess_date assess_id assess_num
1   1234  2019-01-01        64          1
2   1234  2019-02-01        64          2
3   1234  2019-03-01        64          3
4   1234  2019-04-01        64          4
5   4567  2019-01-01        64          1
6   4567  2019-02-01        64          2
7   4567  2019-03-01        64          3
8   4567  2019-04-01        64          4
9   8900  2019-01-01        64          1
10  8900  2019-02-01        64          2
11  8900  2019-03-01        64          3
12  8900  2019-04-01        64          4
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Andrew! This looks like it is reliant upon the assessments occurring in different months, which isn't always the case for me (but which I recognize is consistent with the example that I provided). Thanks for weighing in, though, and thanks for the critique re: posting reproducible examples
Sure thing, and I am glad you found a solution that works for your data! Also, it is customary to hit the check-mark next to the answer which solves your problem if your issue is resolved (i.e., to accept desc's answer). Thanks for following up too!
1

Here is a simplified version using your dates (as factors) to simply extract the level of each variable:

data.example = structure(list(pt_id = c(1234L, 1234L, 1234L, 1234L, 4567L, 4567L, 
                  4567L, 4567L, 8900L, 8900L, 8900L, 8900L), assess_date = structure(c(1L, 
                  2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/1/2019", 
                  "1/2/2019", "1/3/2019", "1/4/2019"), class = "factor"), assess_id = c(64L, 
                  64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L)), class = "data.frame", row.names = c(NA, 
                  -12L))

data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(assess_date))

If they aren't factors (yet), then:

data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(assess_date)))

The output looks like:

# A tibble: 12 x 4
# Groups:   pt_id [3]
   pt_id assess_date assess_id assess_num
   <int> <fct>           <int>      <int>
 1  1234 1/1/2019           64          1
 2  1234 1/2/2019           64          2
 3  1234 1/3/2019           64          3
 4  1234 1/4/2019           64          4
 5  4567 1/1/2019           64          1
 6  4567 1/2/2019           64          2
 7  4567 1/3/2019           64          3
 8  4567 1/4/2019           64          4
 9  8900 1/1/2019           64          1
10  8900 1/2/2019           64          2
11  8900 1/3/2019           64          3
12  8900 1/4/2019           64          4

EDIT: Here is a more explicit set of potential solutions depending on what the original access_date column class is:

library(tidyr)
library(dplyr)

# data.example as tibble:
data.example = structure(list(pt_id = c(1234L, 1234L, 1234L, 1234L, 4567L, 4567L, 
  4567L, 4567L, 8900L, 8900L, 8900L, 8900L), assess_date = structure(c(1L, 
  2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/1/2019", 
  "1/2/2019", "1/3/2019", "1/4/2019"), class = "factor"), assess_id = c(64L, 
  64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L)), row.names = c(NA, 
  -12L), class = c("tbl_df", "tbl", "data.frame"))

# if assess_date is the string class:
data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(assess_date)))

# if assess_date is the factor class:
data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(as.Date(assess_date,"%m/%d/%Y"))))

# if assess_date is the Date class:
data.example <- data.example %>% 
  group_by(pt_id) %>%
  mutate(assess_num = as.integer(as.factor(assess_date)))

4 Comments

@tws061105, the L denotes that the value will be an integer. You can create the reproducible data by taking all or part of your example data and using dput (e.g. dput(mtcars)).
Actually, this doesn't quite work. It worked when I just ran the proposed solution, but when I apply it to my actual data, it doesn't quite work. I see that this proposed solution establishes the "level" of the assessment date as assess_date = structure(c(1L,2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), and then labels those with the dates for those assessments, but in my actual data, I can't easily convert the dates to an integer, and converting the whole column to a factor doesn't quite work either, because patients can be assessed on different days (there's no one set of assessment dates)
Thanks for the help, and with your patience with me posting a simplified version of my data that is less-than-ideal and not reproducible :/
@tws061105, check out the edits to see if it solves your issue. You shouldn't need to be doing any mapping from Date to integer, if the access_date column class is a factor, or you convert it to one as described above, that will map the dates for you
0

Many thanks for the suggestions. Unfortunately, I couldn't get any of the suggested solutions to work, but I did find exactly what I needed in the getanID function from the splitstackshape package, according to the following code:

getanID(data, "pt_id") - worked like a charm!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.