Create index column based on sequence of values in column in R

Question

I am working with a very large data.table in R and am trying to create an index column that is based on a sequence of values in another column - or better yet the reappearance of a value in a column. Below is an example with example code:

temp = data.table(
  col1 = c("A","A","A","A","A","B","B","B", "B", "B", "B"),
  col2 = c(1,   0,  0,  1,  0,  1,  0,  1,   0,   0,   1)
)

This produces a dataset that looks like this:

What I need is to create an index column (preferably using data.table terminology) that looks like this:

col1  col2  col3
A     1     1       
A     0     1       
A     0     1       
A     1     2       
A     0     2       
B     1     3       
B     0     3       
B     1     4       
B     0     4       
B     0     4   
B     1     5

I'm new to using data.tables and haven't been able to find anything on slack or other various help sites that give clues on how to create an index column based on reappearing values in another column. Any help is appreciated!

Maurits Evers · Accepted Answer · 2019-02-27 03:09:03Z

2

Unless I misunderstood, this seems to be a simple matter of (base R's) cumsum?

temp[, col3 := cumsum(col2)]
#    col1 col2 col3
# 1:    A    1    1
# 2:    A    0    1
# 3:    A    0    1
# 4:    A    1    2
# 5:    A    0    2
# 6:    B    1    3
# 7:    B    0    3
# 8:    B    1    4
# 9:    B    0    4
#10:    B    0    4
#11:    B    1    5

answered Feb 27, 2019 at 3:09

Maurits Evers

51k4 gold badges53 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

IRTFM Over a year ago

Upvoted, but .... I could help but wonder if transitions in col1 should be recognized. Sometimes questioners fail to include enough edge cases in their examples. Their example only had one such and it coincided with a transition in col2. This wouldn't be difficult, I think, since there is a by parameter in data.table formalism.

Maurits Evers Over a year ago

You're absolutely right @42-; unfortunately I feel that given the very artificial sample data this is a bit of a guessing game. Perhaps OP can clarify, I'm happy to expand and address any potential edge cases.

rastrast Over a year ago

Bless you @MauritsEvers! Of course it ended up being incredibly simple. @42- Transitions in col1 aren't really a concern and although I am working with very large data there aren't any edge cases to consider for my particular needs right now. Thanks all!

Collectives™ on Stack Overflow

Create index column based on sequence of values in column in R

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related