R add index column to data frame based on row values

Question

This is a continuation of r - How to add row index to a data frame, based on combination of factors

I tried to replicate what I believe to be the desired results using the green checked answer and am consistently getting something other than expected. I am sure I am doing something really basic wrong, but can't seem to see it OR I've misunderstood what the desired state is.

The data from the original post:

temp <- data.frame(
Dim1 = c("A","A","A","A","A","A","B","B"),
Dim2 = c(100,100,100,100,200,200,100,200),
 Value = sample(1:10, 8)
 )

Then I ran the following code: temp$indexLength <- ave( 1:nrow(temp), temp$Dim1, factor( temp$Dim2), FUN=function(x) 1:length(x) )

and: temp$indexSeqAlong <- ave( 1:nrow(temp), temp$Dim1, factor( temp$Dim2), FUN=seq_along )

and then I created the following: temp$indexDesired <- c(1, 1, 1, 1, 2, 2, 3, 3)

...ending up with the data frame below:

  Dim1 Dim2 Value indexLength indexSeqAlong indexDesired
1    A  100     6           1             1            1
2    A  100     2           2             2            1
3    A  100     9           3             3            1
4    A  100     8           4             4            1
5    A  200    10           1             1            2
6    A  200     4           2             2            2
7    B  100     3           1             1            3
8    B  200     5           1             1            4

If I can figure out what I'm not getting the desired index -- and assuming the code is extensible to more than 2 variables -- I should be all set. Thanks in advance!

Are you sure the last value in indexDesired shouldn't be 4? — thelatemail
– thelatemail, Commented Feb 19, 2014 at 4:22
@IShouldBuyABoat I'd be happy to share a small snippet of data that shows the respective results of the 2 approaches proposed below. However, I can't seem to find any direction on how to properly enter data into SO questions and am making lots of extra work for the mods. When I figure that out, I'll put the comparison up. — user2621147
– user2621147, Commented Feb 19, 2014 at 23:01

mnel · Accepted Answer · 2014-02-19 05:20:10Z

3

If you use data.table, there is a "symbol" .GRP which records this information ( a simple group counter)

library(data.table)
DT <- data.table(temp)
DT[, index := .GRP, by = list(Dim1, Dim2)]
DT
#    Dim1 Dim2 Value index
# 1:    A  100    10     1
# 2:    A  100     2     1
# 3:    A  100     9     1
# 4:    A  100     4     1
# 5:    A  200     6     2
# 6:    A  200     1     2
# 7:    B  100     8     3
# 8:    B  200     7     4

answered Feb 19, 2014 at 5:20

mnel

116k28 gold badges269 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user4462740 Over a year ago

What about data frame?

IRTFM · Accepted Answer · 2014-02-19 05:04:32Z

1

Once the values in teh first argument have been partitioned, there is no way that ave "knows" what order they have been passed. You want a method that can look at changes in values. The duplicated function is generic and has a data.frame method that looks at multiple columns:

temp$indexSeqAlong <-  cumsum(!duplicated(temp[, 1:2]) )
temp

  Dim1 Dim2 Value indexSeqAlong
1    A  100     8             1
2    A  100     2             1
3    A  100     7             1
4    A  100     3             1
5    A  200     5             2
6    A  200     1             2
7    B  100     4             3
8    B  200    10             4

Is extensible to as many columns as you want.

answered Feb 19, 2014 at 5:04

IRTFM

264k22 gold badges381 silver badges503 bronze badges

Collectives™ on Stack Overflow

R add index column to data frame based on row values

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related