1

I am working with a very large data.table in R and am trying to create an index column that is based on a sequence of values in another column - or better yet the reappearance of a value in a column. Below is an example with example code:

temp = data.table(
  col1 = c("A","A","A","A","A","B","B","B", "B", "B", "B"),
  col2 = c(1,   0,  0,  1,  0,  1,  0,  1,   0,   0,   1)
)

This produces a dataset that looks like this:

col1  col2
A     1         
A     0         
A     0         
A     1         
A     0         
B     1         
B     0         
B     1         
B     0         
B     0 
B     1

What I need is to create an index column (preferably using data.table terminology) that looks like this:

col1  col2  col3
A     1     1       
A     0     1       
A     0     1       
A     1     2       
A     0     2       
B     1     3       
B     0     3       
B     1     4       
B     0     4       
B     0     4   
B     1     5

I'm new to using data.tables and haven't been able to find anything on slack or other various help sites that give clues on how to create an index column based on reappearing values in another column. Any help is appreciated!

1 Answer 1

2

Unless I misunderstood, this seems to be a simple matter of (base R's) cumsum?

temp[, col3 := cumsum(col2)]
#    col1 col2 col3
# 1:    A    1    1
# 2:    A    0    1
# 3:    A    0    1
# 4:    A    1    2
# 5:    A    0    2
# 6:    B    1    3
# 7:    B    0    3
# 8:    B    1    4
# 9:    B    0    4
#10:    B    0    4
#11:    B    1    5
Sign up to request clarification or add additional context in comments.

3 Comments

Upvoted, but .... I could help but wonder if transitions in col1 should be recognized. Sometimes questioners fail to include enough edge cases in their examples. Their example only had one such and it coincided with a transition in col2. This wouldn't be difficult, I think, since there is a by parameter in data.table formalism.
You're absolutely right @42-; unfortunately I feel that given the very artificial sample data this is a bit of a guessing game. Perhaps OP can clarify, I'm happy to expand and address any potential edge cases.
Bless you @MauritsEvers! Of course it ended up being incredibly simple. @42- Transitions in col1 aren't really a concern and although I am working with very large data there aren't any edge cases to consider for my particular needs right now. Thanks all!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.