Create new columns within loop or apply

Question

The dataset I'm working with is billing data by customer and month. In the end I'd like to make a dataframe that has customer IDs for rows and months for column names - as in the original data set. However, I'd like this new data set to contain dummy variables for whether the customer was "gained" that month aka. they had never been billed before and that month was the first time they were billed.

Here's a reproducible example as well as the loop I have written now:

set.seed(24)
example.data <- data.frame(
   ID = sample(11:20),
   Jan = sample(0:5, 10, replace = TRUE),
   Feb = sample(0:5, 10, replace = TRUE),
   Mar = sample(0:5, 10, replace = TRUE),
   Apr = sample(0:5, 10, replace = TRUE)
)
gained.df.ex <- data.frame(example.data$ID)

## customers can't be gained in the first month
## there's no previous data to verify that this is the first time they've been billed, so all values are 0

gained.df.ex$Jan <- rep(0, length(example.data$ID)

## here's the loop that isn't working

for(i in 3:5){
   new.month.dummy <- for (x in 1:length(gained.df.ex$example.data.ID)){
      ifelse(example.data[x,i] == 0, new.month.dummy[x] <- 0, ifelse(sum(example.data[x,2:(i-1)]} == 0, new.month.dummy[x] <-1, new.month.dummy <- 0))
}

I'm sure there's a way to do this with apply but I'm not sure how.

The expected output would look as follows:

> example.data
   Jan Feb Mar Apr
15   0   3   4   3
19   1   3   0   5
20   4   2   5   1
12   2   1   3   0
14   0   0   2   1
17   5   5   4   4
11   3   4   1   5
18   1   0   0   2
13   3   2   5   3
16   2   5   1   2

> gained.df.ex
   Jan Feb Mar Apr
15   0   1   0   0
19   0   0   0   0
20   0   0   0   0
12   0   0   0   0
14   0   0   1   0
17   0   0   0   0
11   0   0   0   0
18   0   0   0   0
13   0   0   0   0
16   0   0   0   0

Why you have all 0s for the second row in expected output? All the numbers look unique and there is a gain of number 5 after 0. — akrun
– akrun, Commented Apr 27, 2016 at 21:03

akrun · Accepted Answer · 2016-04-27 21:17:23Z

2

We can try

gained.df.ex[names(example.data)] <- t(apply(example.data, 1, function(x) {
            i1 <- tail(which(cumsum(x)==0),1)
             x1 <- rep(0, length(x))
             if(length(i1) >0) replace(x1, i1+1, 1) else x1}))
gained.df.ex[names(example.data)]
#   Jan Feb Mar Apr
#1    0   1   0   0
#2    0   0   0   0
#3    0   0   0   0
#4    0   0   0   0
#5    0   0   1   0
#6    0   0   0   0
#7    0   0   0   0
#8    0   0   0   0
#9    0   0   0   0
#10   0   0   0   0

answered Apr 27, 2016 at 21:17

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Carl Boneri Over a year ago

Still trying to make sense of the original question...but this is a nice method. I was gonna say go with adply and lag and call it a day

Collectives™ on Stack Overflow

Create new columns within loop or apply

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related