2

I am new to dplyr, and I would be curious about a fast way to get from this data:

ID Age YearDied
100 2 2005
102 4 NA
103 1 NA
106 5 2002
108 1 NA
109 1 NA
110 4 NA
112 3 NA

To this data (counting every survivor per Age, meaning if an ID is 5 years old, it passed ages 1,2,3,4,5, whereas an ID of 2 just passes 2 ages (1,2) (does this make sense?):

Age SurvivorNumber
1 8
2 5
3 4
4 3
5 1

Is it also possible to combine the previous result with these ones (number of IDs in the category)?:

Age Current Number of IDs
1 3
2 1
3 1
4 2
5 1

This was my starting code for the last case:

groupedDf <- inputDf %>%
               count(Age)  %>%
               group_by(Age = case_when(Age == 1 ~ '1',
                                          TRUE ~ as.character(Age))) %>%
               group_by(Age = case_when(Age == 2 ~ '2',
                                          TRUE ~ as.character(Age))) %>%
               group_by(Age = case_when(Age == 3 ~ '3',
                                          TRUE ~ as.character(Age))) %>%
               group_by(Age = case_when(Age == 4 ~ '4',
                                          TRUE ~ as.character(Age))) %>%
               group_by(Age = case_when(Age == 5 ~ '5',
                                          TRUE ~ as.character(Age))) %>%
               summarise(n = sum(n))  %>%
               arrange(nchar(Age), Age)
1
  • Please provide reproducible input data and expected output based on that. Commented Apr 15 at 14:10

2 Answers 2

4

It is easier to get the count table first and then generate SurvivorNumber through "reverse cumulative sum":

df %>%
  group_by(Age) %>%
  reframe(`Current Number of IDs`=n()) %>%
  mutate(SurvivorNumber=rev(cumsum(rev(`Current Number of IDs`))))

# A tibble: 5 × 3
    Age `Current Number of IDs` SurvivorNumber
  <int>                   <int>          <int>
1     1                       3              8
2     2                       1              5
3     3                       1              4
4     4                       2              3
5     5                       1              1
Sign up to request clarification or add additional context in comments.

Comments

1

Base R

local({
  i = tapply(X$Age, X$Age, FUN=length)
  data.frame(age = names(i), count = i, survivor = rev(cumsum(rev(i))))
})
  age count survivor
1   1     3        8
2   2     1        5
3   3     1        4
4   4     2        3
5   5     1        1

Data

X = data.frame(
  ID = c(100, 102, 103, 106, 108, 109, 110, 112),
  Age = c(2, 4, 1, 5, 1, 1, 4, 3),
  YearDied = c(2005, NA, NA, 2002, NA, NA, NA, NA))

2 Comments

i <- tabulate(X$Age) may be more concise.
@Edward That would fill zeros at gaps and requires an additional matching step. Replaced seq() with names()!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.