1

I have a dataframe like this:

customer= c('1530','1530','1530','1531','1531','1532')  
month =  c('2021-10-01','2021-11-01','2021-12-01','2021-11-01','2021-12-01','2021-12-01')  
month_number = c(1,2,3,1,2,1)  
df <- data.frame('customer_id'=customer, entry_month=month)  
df
| customer_id| entry_month|
| ---------- | ---------- |
1|      1530 | 2021-10-01 |
2|      1530 | 2021-11-01 |
3|      1530 | 2021-12-01 |
4|      1531 | 2021-11-01 |
5|      1531 | 2021-12-01 |
6|      1532 | 2021-12-01 |

I need to create a column that indicates the number of the month since the customer joined. Here is my desired output:

new_df <- data.frame('customer_id'=customer, 'month'=month, 'month_number'=month_number)  
new_df  
| customer_id| entry_month| month_number |
| ---------- | ---------- |--------------|
1|      1530 | 2021-10-01 | 1            |
2|      1530 | 2021-11-01 | 2            |
3|      1530 | 2021-12-01 | 3            |
4|      1531 | 2021-11-01 | 1            |
5|      1531 | 2021-12-01 | 2            |
6|      1532 | 2021-12-01 | 1            |

3 Answers 3

2

You can convert entry_month to date format and then simply use first:

library(dplyr)
df %>%
  group_by(customer_id) %>%
  mutate(
    entry_month = as.Date(entry_month),
    nmonth = round(as.numeric(entry_month - first(entry_month)) / 30) + 1,
  )

# A tibble: 6 x 3
# Groups:   customer_id [3]
  customer_id entry_month nmonth
  <chr>       <date>       <dbl>
1 1530        2021-10-01       1
2 1530        2021-11-01       2
3 1530        2021-12-01       3
4 1531        2021-11-01       1
5 1531        2021-12-01       2
6 1532        2021-12-01       1

Note that this works if entry_month is always the first day in a month. Otherwise you will have to specify what exactly one month means. E.g. if the first entry is in 2021-10-20 and the second one is in 2021-11-10 what would be desired outcome of nmonth?

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks my friend, worked perfectly. if the day it's not the the first, third month would also be equal to 2. I don't have enough reputation to upvote your answer, but thanks.
1

This takes the year-month part of the date and counts distinct values.

I extended the example to include a repeated month.

library(dplyr)

df %>% 
  group_by(customer_id) %>% 
  arrange(entry_month, .by_group=T) %>% 
  mutate(month_number = cumsum(
           !duplicated(strftime(entry_month, "%Y-%m")))) %>% 
  ungroup()
# A tibble: 7 × 3
  customer_id entry_month month_number
  <chr>       <chr>              <int>
1 1530        2021-10-01             1
2 1530        2021-10-12             1
3 1530        2021-11-01             2
4 1530        2021-12-01             3
5 1531        2021-11-01             1
6 1531        2021-12-01             2
7 1532        2021-12-01             1

Data

df <- structure(list(customer_id = c("1530", "1530", "1530", "1530",
"1531", "1531", "1532"), entry_month = c("2021-10-01", "2021-10-12",
"2021-11-01", "2021-12-01", "2021-11-01", "2021-12-01", "2021-12-01"
)), row.names = c(NA, -7L), class = "data.frame")

Comments

0

Optionally you can use data.table package:

library(data.table)

dt <- setDT(df)

dt[, entry_month := as.IDate(entry_month)] # Tranform the column as "IDate"

dt2 <- dt[, seq_along(entry_month), by = customer_id] # Create the sequence

dt[, mont_number := dt2$V1] # Include into the datatable

dt

Output:

 customer_id entry_month mont_number
1:        1530  2021-10-01           1
2:        1530  2021-11-01           2
3:        1530  2021-12-01           3
4:        1531  2021-11-01           1
5:        1531  2021-12-01           2
6:        1532  2021-12-01           1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.