The dataset I'm working with is billing data by customer and month. In the end I'd like to make a dataframe that has customer IDs for rows and months for column names - as in the original data set. However, I'd like this new data set to contain dummy variables for whether the customer was "gained" that month aka. they had never been billed before and that month was the first time they were billed.
Here's a reproducible example as well as the loop I have written now:
set.seed(24)
example.data <- data.frame(
ID = sample(11:20),
Jan = sample(0:5, 10, replace = TRUE),
Feb = sample(0:5, 10, replace = TRUE),
Mar = sample(0:5, 10, replace = TRUE),
Apr = sample(0:5, 10, replace = TRUE)
)
gained.df.ex <- data.frame(example.data$ID)
## customers can't be gained in the first month
## there's no previous data to verify that this is the first time they've been billed, so all values are 0
gained.df.ex$Jan <- rep(0, length(example.data$ID)
## here's the loop that isn't working
for(i in 3:5){
new.month.dummy <- for (x in 1:length(gained.df.ex$example.data.ID)){
ifelse(example.data[x,i] == 0, new.month.dummy[x] <- 0, ifelse(sum(example.data[x,2:(i-1)]} == 0, new.month.dummy[x] <-1, new.month.dummy <- 0))
}
I'm sure there's a way to do this with apply but I'm not sure how.
The expected output would look as follows:
> example.data
Jan Feb Mar Apr
15 0 3 4 3
19 1 3 0 5
20 4 2 5 1
12 2 1 3 0
14 0 0 2 1
17 5 5 4 4
11 3 4 1 5
18 1 0 0 2
13 3 2 5 3
16 2 5 1 2
> gained.df.ex
Jan Feb Mar Apr
15 0 1 0 0
19 0 0 0 0
20 0 0 0 0
12 0 0 0 0
14 0 0 1 0
17 0 0 0 0
11 0 0 0 0
18 0 0 0 0
13 0 0 0 0
16 0 0 0 0