2

The dataset I'm working with is billing data by customer and month. In the end I'd like to make a dataframe that has customer IDs for rows and months for column names - as in the original data set. However, I'd like this new data set to contain dummy variables for whether the customer was "gained" that month aka. they had never been billed before and that month was the first time they were billed.

Here's a reproducible example as well as the loop I have written now:

set.seed(24)
example.data <- data.frame(
   ID = sample(11:20),
   Jan = sample(0:5, 10, replace = TRUE),
   Feb = sample(0:5, 10, replace = TRUE),
   Mar = sample(0:5, 10, replace = TRUE),
   Apr = sample(0:5, 10, replace = TRUE)
)
gained.df.ex <- data.frame(example.data$ID)

## customers can't be gained in the first month
## there's no previous data to verify that this is the first time they've been billed, so all values are 0

gained.df.ex$Jan <- rep(0, length(example.data$ID)

## here's the loop that isn't working

for(i in 3:5){
   new.month.dummy <- for (x in 1:length(gained.df.ex$example.data.ID)){
      ifelse(example.data[x,i] == 0, new.month.dummy[x] <- 0, ifelse(sum(example.data[x,2:(i-1)]} == 0, new.month.dummy[x] <-1, new.month.dummy <- 0))
}

I'm sure there's a way to do this with apply but I'm not sure how.

The expected output would look as follows:

> example.data
   Jan Feb Mar Apr
15   0   3   4   3
19   1   3   0   5
20   4   2   5   1
12   2   1   3   0
14   0   0   2   1
17   5   5   4   4
11   3   4   1   5
18   1   0   0   2
13   3   2   5   3
16   2   5   1   2

> gained.df.ex
   Jan Feb Mar Apr
15   0   1   0   0
19   0   0   0   0
20   0   0   0   0
12   0   0   0   0
14   0   0   1   0
17   0   0   0   0
11   0   0   0   0
18   0   0   0   0
13   0   0   0   0
16   0   0   0   0
8
  • 1
    Can you post the expected output based on the example Commented Apr 27, 2016 at 20:53
  • 1
    do you only have one row per ID? Commented Apr 27, 2016 at 20:55
  • expected output has been added to the question. Commented Apr 27, 2016 at 21:02
  • Yes, there's only one row per ID Commented Apr 27, 2016 at 21:02
  • Why you have all 0s for the second row in expected output? All the numbers look unique and there is a gain of number 5 after 0. Commented Apr 27, 2016 at 21:03

1 Answer 1

2

We can try

gained.df.ex[names(example.data)] <- t(apply(example.data, 1, function(x) {
            i1 <- tail(which(cumsum(x)==0),1)
             x1 <- rep(0, length(x))
             if(length(i1) >0) replace(x1, i1+1, 1) else x1}))
gained.df.ex[names(example.data)]
#   Jan Feb Mar Apr
#1    0   1   0   0
#2    0   0   0   0
#3    0   0   0   0
#4    0   0   0   0
#5    0   0   1   0
#6    0   0   0   0
#7    0   0   0   0
#8    0   0   0   0
#9    0   0   0   0
#10   0   0   0   0
Sign up to request clarification or add additional context in comments.

1 Comment

Still trying to make sense of the original question...but this is a nice method. I was gonna say go with adply and lag and call it a day

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.