1

Let's say I have a data frame looking like this:

Value1   Value2
1        543
1        845
3        435
5        724
5        234
8        204

Now, I would like the first column to count up sequentially, instead of jumping several steps every time the value changes, like so:

Value1   Value2
1        543
1        845
2        435
3        724
3        234
4        204

If there was some way of simply replacing an element in a data frame with something else, this could be easily done. However, I don't know if there is such a command. Also, I guess some kind of macro command for doing something like this would do, but I guess there isn't such a command.

3 Answers 3

3

Make use of the fact that factor levels will be increasing integers:

> x <- c(1, 1, 3, 5, 5, 8)
> as.numeric(factor(x))
[1] 1 1 2 3 3 4
Sign up to request clarification or add additional context in comments.

2 Comments

You could also do explcity what those commands do implicitly: use match, sort and unique
With the addition of @Dirk Eddelbuettel's code from his comment on his answer, this is the method I used.
1

You can do that with indexing. In essence, you want to add one each time the value in the column changes.

Define the data:

R> z <- c(1,1,3,5,5,8)

All-but-last and all-but-first:

R> head(z,-1)
[1] 1 1 3 5 5
R> z[-1] 
[1] 1 3 5 5 8

Compare, invert comparison and then sum over booleans:

R> z[-1] == head(z,-1)
[1]  TRUE FALSE FALSE  TRUE FALSE
R> z[-1] != head(z,-1)
[1] FALSE  TRUE  TRUE FALSE  TRUE
R> cumsum(z[-1] != head(z,-1))
[1] 0 1 2 2 3
R> 

And then use this where we add 1 to make up for the initial pair-wise comparison:

R> cumsum(c(1, z[-1] != head(z,-1)))
[1] 1 1 2 3 3 4

So you could use such an expression to replace the value in your data.frame.

3 Comments

Nice, but I guess what I'm really after is how to implement these numbers into my data frame. That is, for example, if I want to replace the value of the third row, second column, how do I do that?
You replace entire columns of the data.frame at once. In your notation, and assuming your data.frame is called x (as you never said as your example was not reproducible): x[,"Value1"] <- cumsum(c(1, x[-1,"Value2"] != head(x[,"Value2"], -1)))
Thank you so much! It works like a charm now, although @Andrie's solution was enough for my particular problem.
0

Personally, I kind of like @Andrie's solution. But the first thing I thought of was to use rle:

x <- c(1,1,3,5,5,8)
r <- rle(x)

> rep(seq_len(length(r$lengths)),times = r$lengths)
[1] 1 1 2 3 3 4

One nice thing about @Andrie's solution is that it doesn't assume your vector is sorted, I believe, whereas this (and @Dirk's I believe) both assume it's been sorted.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.