1

I have a data frame with 2 columns, "time and "a".

df <- data.frame(time = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a = c(3, 8, 2, 2, 2, 2, 2, 4, 5))

How is it possible to compare if the values changed over time? I need a new column "comp" in the data frame that shows if the third value in column "c" is the still the same as the last two values and the two values before in the same column. So the result could look like this:

df <- data.frame(time = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a = c(3, 8, 2, 2, 2, 2, 2, 4, 5), comp = c(F, F, F, F, T, F, F, F, F)

In the end I need to compare a column with about 3 mio. observations.

3 Answers 3

3

Using the tidyverse:

library(tidyverse)

df %>% 
  arrange(time) %>% 
  mutate(comp = a == lag(a) & a == lag(a, 2) & a == lead(a) & a == lead(a, 2))

#   time a  comp
# 1    1 3 FALSE
# 2    2 8 FALSE
# 3    3 2 FALSE
# 4    4 2 FALSE
# 5    5 2  TRUE
# 6    6 2 FALSE
# 7    7 2 FALSE
# 8    8 4 FALSE
# 9    9 5 FALSE
Sign up to request clarification or add additional context in comments.

2 Comments

nice and easy solution :) what if I would like to compare specific values. For example the result is only True if there are twos (like in my df example) who haven't changed or (as a condition) sixes?
You can just keep adding conditions to the right-hand side of the comp = ... statement, such as & a == 2.
3

A similar solution to @Bas using data.table

library(data.table)
setDT(df)[, comp := a == shift(a) & a == shift(a, 2) & 
                  a == shift(a, type = 'lead') & a == shift(a, 2, type = 'lead')]

#   time a  comp
#1:    1 3 FALSE
#2:    2 8 FALSE
#3:    3 2 FALSE
#4:    4 2 FALSE
#5:    5 2  TRUE
#6:    6 2 FALSE
#7:    7 2 FALSE
#8:    8 4 FALSE
#9:    9 5 FALSE

2 Comments

thanks for your data.table solution! What if I want to check for example the next 20 values? There must be a better solution than write shift(a, 1), shift(a, 2) etc.... because if I try setDT(df)[, comp := a == shift(a, type = "lag", n = c(1:2)) & a == shift(a, type = "lead", n = c(1:2))] I get an error: 'list' object cannot be coerced to type 'double'
You can use a rolling operations. Check out zoos ?rolapply function. You may ask a new question if you have trouble implementing it for your data.
1

If I understand right, you're looking for values that are the same as their 2 adjacent values on either side, and in this case you're happy to ignore the 'missing' adjacent values for the 2 first & 2 last values.

Using base R:

sameasadj=function(v,n=2,include_ends=T) {
    if(include_ends){vv=c(rep(head(v,1),n),v,rep(tail(v,1),n))} 
    else {vv=c(rep(NA,n),v,rep(NA,n))}
    sapply(seq_along(v),function(i) diff(range(vv[i:(i+2*n)]))==0)
}

df$comp = sameasadj(df$a)
df$comp

Output:

[1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

Explanation:

sameasadj=function(v,n=2,include_ends=T) = define function sameasadj to test whether each value is the same as its adjacent neighbours on each side. We can give the option to choose the number n of adjacent neighbours (in your case 2), and whether-or-not to include the ends (or to return 'NA' for these, since they lack enough neighbours on one side).

if(include_ends){vv=c(rep(head(v,1),n),v,rep(tail(v,1),n))} = if we want to include the ends, then we just add the 'missing' neighbours so that they match

else {vv=c(rep(NA,n),v,rep(NA,n))} = otherwise we add 'NA' values

sapply(seq_along(v),function(i) = go along each position i in the vector...

diff(range(vv[i:(i+2*n)]))==0) = ...and check whether the elements from i to i+2*n are all the same (diff(range(x))==0 will return TRUE if all elements of x are the same)

Putting it all into a function makes it easy to change your mind later about the number of adjacent neighbours, or what to do with the ends...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.