2

This is a continuation of r - How to add row index to a data frame, based on combination of factors

I tried to replicate what I believe to be the desired results using the green checked answer and am consistently getting something other than expected. I am sure I am doing something really basic wrong, but can't seem to see it OR I've misunderstood what the desired state is.

The data from the original post:

temp <- data.frame(
Dim1 = c("A","A","A","A","A","A","B","B"),
Dim2 = c(100,100,100,100,200,200,100,200),
 Value = sample(1:10, 8)
 )

Then I ran the following code: temp$indexLength <- ave( 1:nrow(temp), temp$Dim1, factor( temp$Dim2), FUN=function(x) 1:length(x) )

and: temp$indexSeqAlong <- ave( 1:nrow(temp), temp$Dim1, factor( temp$Dim2), FUN=seq_along )

and then I created the following: temp$indexDesired <- c(1, 1, 1, 1, 2, 2, 3, 3)

...ending up with the data frame below:

  Dim1 Dim2 Value indexLength indexSeqAlong indexDesired
1    A  100     6           1             1            1
2    A  100     2           2             2            1
3    A  100     9           3             3            1
4    A  100     8           4             4            1
5    A  200    10           1             1            2
6    A  200     4           2             2            2
7    B  100     3           1             1            3
8    B  200     5           1             1            4

If I can figure out what I'm not getting the desired index -- and assuming the code is extensible to more than 2 variables -- I should be all set. Thanks in advance!

3
  • Are you sure the last value in indexDesired shouldn't be 4? Commented Feb 19, 2014 at 4:22
  • @thelatemail I am sure it SHOULD be! Thanks for the catch Commented Feb 19, 2014 at 14:49
  • @IShouldBuyABoat I'd be happy to share a small snippet of data that shows the respective results of the 2 approaches proposed below. However, I can't seem to find any direction on how to properly enter data into SO questions and am making lots of extra work for the mods. When I figure that out, I'll put the comparison up. Commented Feb 19, 2014 at 23:01

2 Answers 2

3

If you use data.table, there is a "symbol" .GRP which records this information ( a simple group counter)

library(data.table)
DT <- data.table(temp)
DT[, index := .GRP, by = list(Dim1, Dim2)]
DT
#    Dim1 Dim2 Value index
# 1:    A  100    10     1
# 2:    A  100     2     1
# 3:    A  100     9     1
# 4:    A  100     4     1
# 5:    A  200     6     2
# 6:    A  200     1     2
# 7:    B  100     8     3
# 8:    B  200     7     4
Sign up to request clarification or add additional context in comments.

1 Comment

What about data frame?
1

Once the values in teh first argument have been partitioned, there is no way that ave "knows" what order they have been passed. You want a method that can look at changes in values. The duplicated function is generic and has a data.frame method that looks at multiple columns:

temp$indexSeqAlong <-  cumsum(!duplicated(temp[, 1:2]) )
temp

  Dim1 Dim2 Value indexSeqAlong
1    A  100     8             1
2    A  100     2             1
3    A  100     7             1
4    A  100     3             1
5    A  200     5             2
6    A  200     1             2
7    B  100     4             3
8    B  200    10             4

Is extensible to as many columns as you want.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.